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Abstract 


This  thesis  investigates  adaptive  compiler  systems  that  perform,  during 
program  execution,  code  optimizations  based  on  the  dynamic  behavior  of  the 
program  as  opposed  to  current  approaches  that  employ  a  fixed  code 
generation  strategy,  i.e.,  one  in  which  a  predetermined  set  of  code 
optimizations  are  applied  at  compile-time  to  an  entire  program.  The  main 
problems  associated  with  such  adaptive  systems  are  studied  in  general:  which 
optimizations  to  apply  to  what  parts  of  the  program  and  when.  Two 
different  optimization  strategies  result:  an  ideal  scheme  which  is  not  practical 
to  implement,  and  a  more  basic  scheme  that  is. 

The  design  of  a  practical  system  is  discussed  for  the  FORTRAN  IV 
language.  The  system  was  implemented  and  tested  with  programs  having 
different  behaviorial  characteristics.  In  orde’  to  have  a  basis  for  comparing 
the  results,  variants  of  the  system  were  constructed  which  approximate  the 
behavior  of  WATFIV,  FORTRAN  IV  G,  and  FORTRAN  IV  H  compilers.  The  test 
programs  were  run  under  these  systems.  The  results  show  that  adaptive 
FORTRAN  performs  as  well  or  better  than  any  of  the  variant  systems  at  each 
specific  test  point,  and  significantly  better  than  any  one  of  them  across  the 


entire  range  of  test  points. 
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Chapter  I 


Introduction 


A  serious  disadvantage  of  current  compilers  is  that  they  do  not  take 

into  account  a  program’s  behavior  in  the  generation  of  object  code.  In 

particular,  the  code  generation  phases  of  these  compilers  employ  a  fixed 

compile-time  strategy,  i.e.,  the  degree  of  code  optimization  is  predetermined 
and  the  optimizations  are  applied  uniformly  to  each  section  of  a  program, 
independent  of  how  often  the  section  is  executed.  As  a  consequence,  special 
purpose  compilers  have  been  designed  to  handle  a  specific  class  of  programs 
or  to  meet  specific  needs,  and  the  decision  of  which  compiler  to  use  is 
placed  upon  the  user.  For  example,  for  the  FORTRAN  language  there  exist 
on  the  same  machine  three  special  purpose  compilers  having  different 

trade-offs  between  compile  time  and  code  efficiency,  viz.,  WATFIV, 
rORTRAN-IV  G  and  FORTRAN-IV  H.  WATFIV  u  designed  to  handle  jobs  for 
which  compile  time  is  a  major  factor.  FORTRAN-IV  G  produces  fairly  efficient 
code  by  applying  some  local  optimizations.  FORTRAN-IV  H  is  designed  for 
production  programs.  It  produces  highly  efficient  code,  but  there  is  a 
substantial  increase  in  compilation  time. 

This  thesis  investigates  a  dynamic  run-time  code  optimization  strategy 
based  on  the  dynamic  behavior  of  the  program.  Motivation  for  such  a 
system  stems  from  the  empirical  evidence  produced  by  the  research  of 
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Knuth  [Knu70],  Darden  and  Heller  [Dar70],  Ingalls  [Ing71]  and  Jasik  [Jas71], 
viz.,  that  a  small  part  of  a  program  (<5 7)  accounts  for  a  large  part  of  its 
execution  time  (>507.).  Their  schemes  can  be  classified  as  "iterative 
optimization"  which  involve  a  feedback  loop  between  the  system  and  the  user. 
The  user’s  program  is  monitored  via  software  or  hardware,  and  the  system 
produces  an  execution  profile  of  where  the  program  is  spending  its  time. 
Using  this  profile,  the  user  optimizes  his  program  and  runs  it  again,  obtaining 
another  profile;  and  so  forth.  Major  drawbacks  to  such  an  approach  are  the 
limitations  placed  on  the  amount  of  optimization  the  user  can  perform,  and 
the  inclusion  of  the  user  in  the  feedback  loop.  We  advocate  removing  the 
user  from  this  feedback  loop  and  automating  the  process. 

Our  major  goal  is  to  demonstrate  not  only  that  it  is  possible  to 
construct  such  an  au'omated  system,  but  that  it  is  worthwhile,  i.e.,  that  it  can 
perform,  for  almost  all  programs,  as  well  or  better  than  any  special  purpose 
compiler  employing  a  fixed  code  generation  strategy. 

1.1  Current  Optimization  Techniques 

The  development  of  code  optimization  strategies  has  been  under 
investigation  since  1965.  This  initial  research  culminated  in  a  set  of  machine 
independent  optimizations  that  are  applicable  to  most  high  level  languages 
[cf.  All 69].  The  development  of  more  efficient  algorithms  for  these  "classical" 
optimizations  has  been  the  object  of  study  by  other  investigators,  notaoly 
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Lowry  and  Medlock  [Low69]  and  Cock  and  Schwartz  [Coc70].  The 
effectiveness  of  these  optimizations  was  clearly  demonstrated  in  the 
FORTRAN-IV  H  compiler  of  Lowry  and  Medlock,  who  stated  that  even  though 
there  was  a  407.  increase  in  compilation  time,  the  object  code  was  257 
smaller  and  executed  three  times  faster  than  that  produced  by  the 
FORTRAN-IV  G  compiler. 

The  goal  of  this  research  is  to  demonstrate  the  effectiveness  of 
applying  code  optimization  at  run-time  instead  of  at  compile-time.  It  suffices 
to  select  optimizations  from  among  the  "classical"  optimizations,  for  they  are 
just  as  applicable  at  run-time.  There  were  a  number  of  selection  criteria 
that  are  worth  mentioning.  Foremost,  we  wanted  to  include  enough  machine 
independent  and  dependent  optimizations  to  produce  results  of  broad 
significance.  Also,  the  optimizations  must  have  been  pro'en  by  others  to  be 
effective,  i.e.,  they  produce  a  significant  decrease  in  execution  time  for  the 
effort  expended. 

The  set  of  machine  independent  optimizations  selected  were: 

1)  Constant  Folding:  performing  operations  whose  operands 
are  known.  This  technique  is  particularly  beneficial  for 
code  generated  to  calculate  the  address  of  an  array 
element. 

2)  Common  Subexpression  Elimination  (CSE):  eliminating 
redundant  expression  computations. 

3)  Code  Motion  (CM):  moving  operations  invariant  within  a 
loop  outside  the  loop. 
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The  set  of  machine  dependent  optimizations  selected  (which  are 
applicable  to  most  machines)  are: 

1)  replacing  a  multiplication  or  division  by  a  power  of  2 

with  a  shift. 

2)  setting  memory  to  0  or  -1  by  special  instructions. 

3)  delaying  negation  operators  to  exploit  load  and  store 

negative  instructions. 

4)  deleting  multiplications  by  1  or  additions  of  0. 

5)  performing  operations  directly  to  memory,  e.g., 
incrementation  or  decrementation  by  a  small  constant. 

6)  use  of  index  registers  for  DO  loops  and  for  accessing 

array  elements. 

7)  effective  use  of  registers  by  an  appropriate  register 

allocation  policy. 

The  algorithms  for  the  selected  optimizations  have  certain  characteristics 
that  influence  the  design  and  structure  of  any  system  which  employs  run-time 
optimization.  First,  the  algorithms  do  not  operate  on  the  program  source 
text,  but  on  some  intermediate  form.  The  compiler  must  generate  this 
intermediate  form  (regardless  of  when  the  optimizations  are  applied).  Since 
these  optimization  algorithms  are  to  be  invoked  at  run-time,  the  intermediate 
form  was  chosen  so  that  it  could  be  directly  executed  (interpreted). 

Second,  the  algorithms  do  not  operate  at  the  basic  instruction  level,  but 
on  aggregates  of  instructions  cr  groups  of  aggregates  (loops).  The  compiler 
will  have  to  decompose  tne  program  into  these  basic  aggregates. 


^  ...... 
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Third,  certain  algorithms  rely  on  control  flow  analysis.  The  compiler  (or 
loader)  will  have  to  generate  a  form  for  encoding  the  flow  relationships.  The 
form  we  will  use  is  a  directed  graph. 

Finally,  the  optimizations  can  be  applied  individually,  and  usually  must 
be  applied  in  a  given  order.  These  two  characteristics  are  important  in  that 
they  allow  for  gradual  optimization  of  the  program,  a  concept  fundamental  to 
our  approach  which  is  predicated  on  and  supported  by  recent  empirical 
results  on  program  behavior. 

1.2  Empirical  Results  on  Program  Bahavior 

Recent  investigations  by  Knuth  [Knu70j,  Ingalls  [Ing71]  and  Darden  and 
Heller  [Dar70]  found  that  a  small  portion  of  the  code  in  typical  programs 
accounted  for  most  of  the  execution  time.  Knuth  studied  a  varied  collection 
of  FORTRAN  programs  covering  a  wide  variety  of  applications,  and  found  that 
less  than  47.  of  a  program  accounts  for  more  than  507  of  its  execution  time. 
He  suggested  that  the  system  produce  a  program’s  profile,  i.e.,  a  histogram 
showing  the  frequency  counts  of  the  executable  statements,  which  can  reveal 
where  the  program  is  spending  its  time.  This  infori..ation  would  then  be 
used  by  the  user  or  compiler  in  deciding  what  part  of  the  program  to 
optimize. 

Ingalls  participated  in  Knuth’s  investigation  and  his  paper  pursues  the 
notion  of  a  system  producing  the  execution  profile  of  a  program.  He 
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concludes  that  current  optimizations  have  taken  us  about  as  far  as  is 
worthwhile,  and  that  if  further  gains  are  to  be  made,  optimizations  such  as 
in-line  I/O  editing  or  expansion  of  subroutines  in-line  should  be  developed,  or 
the  sytem  should  produce  feedback  information  (i.e.,  an  execution  profile)  to 
the  user  that  tells  him  where  his  program  is  spending  most  of  its  time.  He 
found  that  for  all  the  programs  studied,  37.  of  the  statements  made  up  5 Q7. 
of  the  program’s  execution  time. 

Darden  and  Heller  studied  the  performance  of  two  compilers  and  an 
assembler,  and  founo  that  for  the  systems  tested,  at  most  37.  of  the  code 
accounted  for  more  than  607.  of  the  execution  time.  These  percentages  are 
taken  from  their  graphs  given  in  Figure  1.1.  They  advocated  producing  a 
histogram  of  processor  time  hy  blocks  of  memory  locations.  Using  this 
profile,  the  user  would  optimize  the  critical  sections  of  the  code  and  run  the 
system  again.  This  iterative  optimization  procedure  would  be  repeated  until 
there  was  little  improvement  in  overall  performance.  They  tried  this 
technique  on  an  ALGOL  compiler  and  found  that  after  four  iterations  they  had 
improved  the  compiler’s  speed  by  a  factor  of  10  while  only  rewriting  5 7.  of 


the  code. 
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Figure  1.1  Cumulative  distribution  of  CPU  time.  For  a  typical  FORTRAN 
compiler,  over  60  percent  of  the  central  processor’s  time  is  spent  in 
executing  only  1  percent  of  the  code.  Clearly,  that  1  percent  of  the  code  is 
the  area  to  optimize.  In  fact,  10  percent  of  the  code  accounts  for  90 
percent  of  the  execution  time  of  all  the  systems  tested  by  the  authors. 
(Reprinted  from  COMPUTER  DECISIONS,  October,  1970,  page  29-33,  copyright 
1970,  Hayden  Publishing  Co.) 
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An  inference  from  these  empirical  results  is  that  the  amourt  of  effort 
that  should  be  e“p°r.ded  to  optimize  a  section  of  code  should  be  proportional 
to  the  execution  time  that  section  represents.  It  is  dso  felt  that  the  57.-507. 
empirical  rule  is  universally  rue  since  a  wide  class  of  problems  were  studied 
by  the  various  authors. 

A  major  drawback  to  these  approaches  is  that  the  user  is  included  in 
the  feedback  loop.  We  feel  that  the  jxecutior  profile  is  a  useful  concept 
which  has  other  advantages  (such  as  a  debugging  aid,  or  pointing  out  to  the 

user  that  he  should  use  a  different  algorithm  or  restructure  his  data). 

However,  its  utility  as  a,i  optimization  tool  has  its  limitations  with  respect  to 
the  user.  First,  it  requires  the  user  to  be  knowledgeable  with  optimization 
techniques.  Second,  for  those  optimizations  tr.at  cannot  be  performed  at  the 
source  level,  the  user  must  resort  to  writing  in  machine  language  which  he 
must  learn,  thus  defeating  the  purpose  of  using  a  high  level  language. 
Finally,  the  user  may  introduce  more  bugs  into  the  program. 

The  user  could  overcome  these  limitations,  but  we  assert  (and  this 

thesis  will  show)  that  the  process  of  using  the  execution  profile  to  optimize 
the  appropriate  sections  of  the  program  can  be  done  automatically  at  the 

source  language  level  without  user  intervention. 


1.3  An  Adaptive  Compiler 


There  are  a  number  of  automated  approaches  that  we  could  take.  The 
profile  could  be  fed  back  to  the  comp.ler  the  next  time  the  program  is 
compiled.  Using  the  profile,  the  compiler  could  optimize  the  appropriate 
sections  of  the  program.  However,  which  code  sections  to  optimize  may  vary 
from  run  to  run  if,  for  example,  the  program’s  behavior  is  sensitive  to  its 
input  data.  Therefore,  a  feedback  system  does  not  seem  to  provide  the  best 

solution. 

A  more  desirable  approach  would  be  to  perform  code  optimizations 
while  the  program  js  running.  That  is,  the  system  would  dynamically  adapt 
the  compiled  code  ^response  to  the  program’s  dynamic  behavior.  Such  a 
system  we  form  an  adaptivet  system. 

This  thesis  will  show  that  an  adaptive  compiler  system  is  a  feasible  and 
worthwhile  alternative  to  current  compiler  construction  approaches.  We  will 
first  turn  our  attention  to  solving  the  problems  of  determining  which  section 
of  code  to  optimize,  when  to  optimize  it,  and  how  much  optimization  to  apply 
to  it.  Our  goal  is  to  find  a  solution  that  minimizes  the  overhead  incurred  in 
answering  these  questions,  for  we  want  the  system  to  perform  well  across 
the  entire  execution  time  spectrum.  Then,  in  order  to  prove  the  technique  is 

t  The  term  adaptive  is  not  meant  to  imply  that  the  compiler  self-adapts  to 
its  environment,  i.e.,  keeping  statistics  on  the  constructs  used  most  frequently 
and  thereby  producing  more  efficient  code  for  them. 
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feasible,  we  will  discuss  the  design  and  implementation  of  an  adaptive  system 
for  the  FORTRAN-IV'  language.  To  show  the  adaptive  FORTRAN  system  is 
worthwhile,  its  performance  will  be  measured  on  a  variety  of  test  programs 
having  different  characteristics.  In  order  to  evaluate  the  performance 
measurements  in  a  meaningful  and  unbiased  manner,  the  adaptive  compiler  will 
be  transformed  into  systems  that  generate  code  comparable  to  that  produced 
by  WATFIV,  FORTRAN-IV  G  and  FORTRAN-IV  K  Then  the  test  programs  will 
be  run  under  these  systems  and  the  perfornipnce  measurements  compared 
with  those  of  the  adaptive  system. 


/) 


Chapter  II 

Adaptive  Compiler  Systems 


In  this  chaper  we  will  look  at  the  problems  associated  with  constructing 
an  adaptive  compiler  and  present  solutions  that  can  be  realistically 
implemented.  The  basic  issues  that  we  will  address  are: 

1)  what  information  must  be  collected  during  the 

translation  and  loader  phases  to  facilitate  run-time 
optimization, 

2)  the  characteristics  of  the  code  the  translator  must 
produce  for  the  optimizers, 

3)  methods  for  grouping  the  code  into  blocks  to  facilitate 
its  processing  by  the  optimizers, 

A)  attributes  of  code  blocks  that  can  be  metered  to 
determine  which  blocks  to  optim  ze, 

5/  methods  for  determining  which  optimizations  to  apply  to 
the  code  blocks  and  when, 

and  6)  control  of  the  running  program  so  optim'zation  can  be 
intermixed  with  execution. 


Three  of  the  issues,  viz.,  determining  which  code  blocks  to  optimize, 
when,  and  how  much,  form  the  basis  for  any  dynamic  optimization  strategy. 
We  will  discuss  two  strategies.  The  first,  iterative  dynamic  optimization  (see 
Section  2.4),  is  based  on  a  mathematical  model,  which  represents  an  exact 
formulation  for  solving  the  problem  of  what  to  optimize  and  how  much,  but 
not  when.  The  scheme  is  impractical  to  use  (see  Section  2.4.2),  but  it  is 
presented  because  the  solution  of  such  a  formulation,  regardless  of  how 
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inefficient,  would  give  us  a  standard  to  compare  against  other  schema.  It  is 
possible  to  obtain  such  an  absolute  measure  of  performance  (see  Chapter  5), 
but  even  for  one  program  it  would  require  a  tremendous  amount  of  work. 
Since  dynamic  optim'.atio'i  has  never  been  studied  before,  it  was  felt  that 
the  primary  goal  of  tho  thesis  was  to  see  if  the  approach  was  valid  instead 
of  spending  time  to  obtain  the  best  performance  curve  for  a  few  programs. 
Therefore,  we  formulated  a  more  direct  approach,  the  incremental  dynamic 
optimization  scheme  (see  Section  2.5),  which  incurs  very  little  overhead.  It  is 
a  heuristic  approach  based  cn  the  notion  that  one  optimization  at  a  time 
should  be  applied  to  a  rode  block,  and  the  assumption  that  the  execution 
time  per  program  run  for  a  code  block  is  proportional  to  the  frequency  with 
which  it  is  executed.  Such  an  assumption  allows  a  frequency  count  to  be 
used  as  a  metr  c  for  controlling  the  adaption  process.  This  count  is  a 
function  of  the  code  block’s  attributes,  such  as  size,  level  of  nesting,  etc.. 

To  control  the  execution  and  invoke  an  optimization  strategy  requires  a 
supervisor.  We  will  describe  the  operational  cha-acteristics  for  such  a 
supervisor  in  general,  and  specifically  for  a  system  employing  the  incremental 
dynamic  optimization  strategy. 

In  the  following  chapters,  we  will  describe  the  design,  implementation 
and  performance  of  an  actual  system  that  employs  incremental  dynamic 
optimization  and  incorporates  the  ideas  expounded  in  this  chapter.  Since  the 
ite*ative  dynamic  optimization  scheme  in  Section  2 A  is  not  pertinent  to  this 


description,  it  may  be  bypassed  on  a  first  reading  without  loss  of  continuity. 

2.1  Overall  Design  Considerations 

The  primary  goal  in  the  design  of  an  adaptive  compiler  system  Is  to 
minimize  the  total  cost  of  running  a  program.  This  goal  has  direct 

implications  with  respect  to  the  design  of  the  translator  and  the  dynamic 
optimization  strategy. 

The  design  of  the  translator  can  proceed  along  the  lines  currently 
employed  in  the  construction  of  any  translator,  but  it  must  be  as  efficient  as 
possible.  This  meins  that:  1)  it  should  expend  a  minimum  of  e<*o»'t  in 
translating  the  source  code,  in  particular,  not  performing  any  optimizations 
that  can  be  done  more  effectively  and  efficiently  at  run-time;  2)  it  should 

employ  the  best  translating  algorithms  available;  it  should  itself  be 

optimized  and  4)  it  should  be  one  passt  and  compile  directly  to  core. 

In  terms  of  the  dynamic  optimization  strategy,  minimizing  total  cost 

requires  the  optimization  algorithms  to  be  efficient,  and  the  overhead  incurred 
by  deciding  when  to  perform  what  optimization  on  wi  ich  sections  of  code  to 

t  A  second  pass  ever  the  object  code  is  needed  to  complete  the  translation 
process,  e.g.,  allocate  data  storage,  patch  addresses,  patch  forward  references, 
relocate  the  object  code,  etc..  This  second  pass  of  the  compilation  process  is 
handled  by  a  program  which,  due  to  its  similarity  to  others  of  the  same 
name,  we  shall  call  a  loader. 


be  small  compared  to  the  expected  payoff. 


There  are  four  basic  design  decisions  that  must  be  made;  they  are  a 
consequence  of  both  the  fact  thsi  optimization  is  to  be  performed  at  run-time 
and  the  nature  of  the  optimizers.  First,  an  internal  representation  of  the 
source  code  that  can  be  efficiently  manipulated  by  the  optimizers  must  be 
selected.  This  internal  form  cannot  be  machine  language  because  at  this  level 
too  much  information  that  will  be  needed  by  the  optimizers  has  been  lost, 
and  it  should  not  be  the  source  code  because  the  source  code  does  not 
explicitly  indicate  the  structure  of  the  program  and  takes  too  much  time  to 
scan.  Possible  internal  forms  include:  Polish  notation,  quadruples,  triples, 
indirect  triples,  or  trees  (cf.  [Gri71]).  The  translator  can  produce  the  internal 
form  as  object  code,  for  it  is  amenable  to  being  executed  (interpreted).  For 
the  translator  also  to  produce  machine  language  would  be  a  waste  of  effort 
because  empirical  evidence  shows  that  some  sections  of  code  will  not  be 
executed  often  enough  to  warrent  it. 

Second,  the  internrl  form  (which  we  will  assume  is  an  n-tuple  that 
denotes  an  "instruction")  must  be  grouped  according  to  the  program’s 
structure  into  aggregates  so  that  the  optimizations  which  require  global  flow 
information  can  be  applied.  It  is  a  characteristic  of  the  classical  optimizations 
that  we  will  employ  (see  Section  1.1)  that  they  operate  on  two  kinds  of 
aggregates:  a  group  of  sequential  instructions  terminated  by  an  unconditional 
branch  (a  basic  block)  and  a  group  of  basic  blocks  which  form  a  loop-like 
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structure  (a  segment).  Initially,  as  the  internal  form  is  being  generated,  it  is 
partitioned  by  the  translator  into  basic  blocks.  As  the  program  executes, 
optimizations  are  performed  on  those  basic  blocks  executed  most  frequently. 

In  order  for  additional  optimizations  to  be  performed,  the  segment  containing 
an  optimized  basic  block  must  be  formed.  The  process  of  combining  basic 
blocks  Or  segments  into  a  (larger)  segment  is  called  fusion,  and  constitutes  a 
new  optimization. 

Third,  a  dynamic  optimization  strategy  must  be  proposed,  i.e.,  a  scheme 
for  determining  which  basic  blocks  or  segments  to  optimize,  which 
optimizations  to  apply  and  when  to  perform  the  optimizations.  Even  though 
there  is  more  information  available  at  run-time  than  at  compile-timo  to  aid  in 
making  these  decisions,  it  is  not  complete  (we  cannot  predict  a  program’s 
future  behavior  with  absolute  accuracy).  A  reasonable  approach  is  to  assume 
that  future  behavior  of  a  program  will  be  similar  to  past  behavior,  for  it  is 
better  to  base  the  decision-making  on  this  information  than  none  at  all.  Such 
an  assumption  is  not  uncommon;  it  is  often  made  in  other  areas  of  computer 
science  (e.g.,  paging  algorithms  and  schedulers).  As  is  the  case  with  the 
other  areas,  we  are  susceptible  to  anomalies.  For  example,  it  is  possible  to 
waste  optimization  effort  *f  the  program  terminated  soon  after  the  effort  was 
expended,  or  the  program  is  phased  and  after  the  optimization  of  a  phase  it 
is  only  executed  a  few  times  more  and  then  never  executed  again.  By 
selecting  a  strategy  that  causes  optimization  of  basic  blocks  or  segments  to 
be  gradual,  the  amount  of  effort  wasted  can  be  kept  tolerably  small. 
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Finally,  means  for  controlling  the  execution  of  a  program  so  it  can  be 
adapted  must  be  determined.  Basic  blocks  and  segments  are  the  discrete 
units  of  execution.  A  logical  point  at  which  to  interrupt  a  program  s 

execution  for  adaption  is  when  control  passes  between  two  basic  blocks  or 
segments,  since  program  status  is  well  defined  at  such  points,  and  the  amount 
of  state  information  required  to  record  this  status  is  small.  When  execution 
is  interrupted,  data  which  aids  in  the  decisions  made  by  the  dynamic 
optimization  strategy  is  collected,  and  it  is  decided  whether  to  invoke  the 
dynamic  optimization  strategy  and  perform  optimizations.  If  control  of 
execution  is  distributed  amongst  the  individual  basic  blocks  and  segments,  then 
apDropriate  instructions  must  be  inserted  in  the  code  to  perform  the 
functions  just  described.  Another  approach  is  to  centralize  these  functions 
and  control  of  execution  in  a  supervisor  which  causes  the  basic  blocks  or 
segments  to  be  executed  one  at  a  time.  This  is  the  approach  we  will  follow 
for  the  system  to  be  implemented.  The  supervisor,  known  as  the  segment 
driver,  is  advantageous  for  another  reason.  During  the  execution  of  the 

program,  some  parts  of  it  will  be  in  interpretive  code,  while  other  parts  will 
be  in  machine  language.  The  supervisor  can  conveniently  decide  whether  to 
execute  a  basic  block  or  segment  directly  or  call  an  interpreter. 

The  structure  of  an  adaptive  compiler  system  is  now  apparent.  Source 
code  is  translated  by  a  fast  and  efficient  translator  into  an  internal  form  that 
is  grouped  into  basic  blocks.  Execution  of  the  program  is  controlled  by  a 
segment  driver  and  optimization  by  a  dynamic  optimization  strategy.  Various 
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optimizations  are  applied  to  basic  blocks  and/or  segments  as  execution 

proceeds  and  the  performance  of  the  program  warrents.  A  new  optimization, 

fusion,  is  necessary  for  grouping  basic  blocks  into  segments.  In  the  following 
sections,  we  will  define  more  precisely  the  concepts  of  basic  block,  segment, 
fusion,  and  segment  driver,  and  propose  two  dynamic  optimization  strategies. 

2.2  Basic  blocks  and  Segmentst 

When  performing  code  optimizations,  it  is  advantageous  to  partition  the 
program  according  to  its  flow  of  control  into  basic  blocks.  A  basic  block  is 
a  linear  sequence  of  instructions  with  the  first  instruction  being  the  single 
entry  point.  The  block  is  terminated  by  one  or  more  branch  instructions,  the 

last  of  which  is  unconditional  while  the  others  (if  any)  are  conditional.  All 

code  between  the  first  instruction  and  the  benches  is  executed  in  sequence. 

A  program’s  flow  of  control  may  be  represented  by  a  directed  control 
flow  graph  in  which  a  node  represents  a  basic  block  and  an  edge  represents 
a  flow  path.  Those  basic  blocks  that  branch  to  a  given  block  are  its 

t  In  this  section,  some  of  the  definitions  follow  the  terminology  introduced  by 
Allen  [AII69,  AII70]  (viz.,  basic  block  and  directed  control  flow  graph)  and 
Lowery  and  Medlock  [Low69]  (viz.,  predominance),  while  others  pertaining  to 
directed  graphs  can  be  found  in  any  introductory  textbook  on  graph  theory 
(cf.  [Har69])  (viz.,  immediate  successor  or  predecessor,  subgraph,  path  and 
strongly  connected  region). 
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immediate  predecessors.  Likewise,  those  blocks  branched  to  by  a  given  basic 
block  are  its  immediate  successors.  A  basic  block  may  have  more  than  one 
immediate  predecessor  or  successor,  including  itself.  Program  entry  blocks 
have  no  predecessors,  and  program  terminating  blocks  have  no  successors.  A 
basic  block  8*  predominates  a  block  B2  if  every  path  along  a  sequence  of 
successors  from  a  program  entry  block  to  B2  always  passes  through  Bi. 

The  basic  block  is  the  smallest  program  unit  commonly  considered  for 
optimization.  However,  tiiore  is  a  limit  to  the  amount  of  optimization  that  can 
be  performed  on  a  basic  block,  and  in  order  to  perform  additional 
optimization  it  is  necessary  to  consider  more  global  context.  Since  it  is 
desirable  to  optimize  those  basic  blocks  executed  repetitively,  some  loop-like 
structure  must  be  imposed  on  the  flow  graph.  Two  loop-like  constructs  have 
been  described  in  the  literature:  the  strongly  connected  region  [AII69]  and 
the  interval  [Coc70,  AII70].  A  strongly  connected  region  is  a  subgraph  of 
the  flow  graph  in  which  there  is  a  path  leading  from  any  block  in  the  region 
to  every  other  block.  The  region  may  have  several  entry  points.  An 
interval  is  the  maximal  single  entry  subgraph  of  the  flow  graph  in  which  all 
closed  paths  contain  the  entry  block. 

We  introduce  another  simitar,  but  not  equivalent,  concept  called  a 
segment.  With  respect  to  a  given  strongly  connected  region  (loop)  in  the 
directed  control  flow  graph,  a  segment  is  the  minimal  directed  subgraph  with 


the  following  properties: 
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1)  The  segment  contains  all  the  basic  blocks  in  the  loop. 

2)  There  is  a  single  entry  block.  This  entry  block  may 
have  one  or  more  immediate  predecessors  of  which  at 
least  one  is  not  contained  in  the  segment.  Those 
immediate  predecessors  of  the  the  entry  block  that  are 
predominated  by  it  are  contained  in  the  segment. 

3)  Except  for  the  entry  block,  all  immediate  predecessors 
of  each  basic  block  in  the  segment  are  contained  in  the 
segment. 

4)  The  segment,  Aj,  and  another  segment,  Aj,  are  either 
disjoint,  i.e.,  they  have  no  basic  blocks  in  common  and 
therefore  are  parallel  structures,  or  one  is  embedded 
inside  the  other.  If  AjnAj»Aj,  then  Aj  is  embedded 
inside  Aj,  and  Aj  is  said  to  cover  segment  Aj. 

Thus,  if  a  loop  has  a  single  entry  block,  it  is  identical  to  a  segment 
whose  segment  entry  block  is  the  same  as  the  loop  entry  block.  If  the  loop 
has  multiple  entry  points,  the  segment  is  the  loop  extended  to  include  the 
minimum  number  of  basic  blocks  satisfying  properties  2,  3  and  4.  Property  4 
defines  a  properly  nested  set  of  segments,  and  allows  the  optimizations  to  be 
ordered  in  the  manner  suggested  by  Allen  [AII69]. 

Examples:  (a)  Segments:  2'«*{2,3}  (b)  Segments:  2'*{2},  2"-{2,3} 
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Unlike  a  strongly  connected  region,  there  is  not  necessarily  a  path 
leading  from  any  block  in  a  segment  to  any  other  block  because  of  ihe 
requirement  that  a  segment  have  a  single  entry  point.  A  segment  always 
contains  a  strongly  connected  region,  but  the  single  entry  point  frr  the 
segment  may  not  necessarily  be  contained  in  the  region.  Consider  example 
(d)  above  in  which  segment  2'  contains  the  strongly  connected  region  {4,5,6} 
which  has  two  entry  points  (blocks  4  and  5),  and  the  entry  point  of  the 
segment  (block  2)  is  not  contained  in  the  region.  Like  a  segment,  an  interval 
contains  a  single  entry  point,  but  it  does  not  necessarily  contain  a  closed 
path,  and  the  intervals  of  a  graph  are  disjoint.  In  example  (b)  above,  the 
intervals  are  {1}  and  {2,3}. 


The  concept  of  a  segment  wa'  chosen  over  that  of  a  strongly 
connected  region  or  interval  because  of  the  simplicity  of  the  algorithm  for 
constructing  them.  The  algorithm  is  an  iterative  process.  A  block  which  is 


executed  repetitively  can  be  used  to  start  the  segment.  Given  such  a  block, 
and  considering  a  segment  already  formed  as  a  basic  unit,  it  is  possible  to 
construct  the  segment  containing  the  block  knowing  just  the  immediate 
predecessors  of  each  basic  block  in  the  flow  graph.  A  basic  blocks  list  of 
immediate  predecessors  can  be  constructed  from  the  branches  that  terminate 
each  basic  block.  This  c.an  be  done  either  by  the  compiler  or  by  the  loader. 

2.3  Fusion 

When  a  segment  is  formed  via  fusion,  what  optimizations  are  applicable 
to  it  depend  on  how  embedded  segments  are  treated,  i.e.,  whether  or  not  the 
new  segment  is  considered  to  be  a  homogeneous  structure  with  respect  to 
future  optimizations.  Define  the  ootimization  state  of  a  basic  block/segment 

to  be  the  result  of  the  application  of  an  optimization.  As  optimizations  are 
performed  on  a  basic  block/segment,  they  will  advance  through  different 
optimization  states.  One  basic  block/segment  is  said  to  have  a  higher 

optimization  state  than  another  if  more  optimizations  have  been  applied  to  it. 
If  we  employ  homogeneous  fusion,  then  the  optimization  state  of  a  segment  is 
uniform,  i.e.,  is  the  maximum  optimization  state  of  its  constituents,  and  the 
result  is  a  homogeneous  segment.  If  the  segment  contains  no  embedded 
segments,  then  its  optimization  state  is  the  maximum  optimization  state  of  its 
basic  blocks;  otherwise  it  is  the  maximum  optimization  state  of  the  segments 
it  covers.  In  order  to  advance  the  segment  to  its  optimization  state  it  may 
be  necessary  to  perform  one  or  mo'e  optimizations  on  its  embedded  basic 
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blocks/segments.  For  fulure  optimizations,  the  covering  segment  could  be 
considered  a  discrete  unit.  But  this  has  a  distinct  disadvantage  if  at  the  time 

of  fusion  the  embedded  basic  blocks/segments  have  not  attained  the  highest 

* 

optimization  state,  for  the  effect  of  additional  optimizations  on  these 
embedded  units  will  never  be  realized.  Alternatively,  if  the  identity  of  the 
embedded  basic  blocks/segments  is  retained,  further  optimization  could  be 
applied  to  them  before  being  applied  to  the  covering  segment.  The  only 
constraint  is  that  all  units  attain  the  same  optimization  state. 

If  on  the  other  hand  we  employ  non"homogeneous  fusion,  a  segment 
and  its  embedded  basic  blocks/segments  exist  at  different  optimization  states, 
and  the  result  is  an  non-homogeneous  segment.  This  approach  is  more 
restrictive  from  an  optimization  point  of  view  in  the  sense  that  the 
optimizations  that  are  applicable  to  a  segment  and  the  segments  it  covers 
depend  on  the  current  optimization  state  of  each.  There  are  a  number  of 
ways  optimizations  can  take  place.  One  method  is  to  let  the  covering 
segment  control  when  the  embedded  segments  get  optimized  further.  For 
example,  when  a  segment  is  to  be  optimized,  its  embedded  segments  could 
first  be  advanced  to  their  next  optimization  state,  starting  with  the  innermost 
one  and  working  outwards.  Other  approaches  are  to  let  each  segment  be 
optimized  separately  at  its  own  rate,  or  to  freeze  the  optimization  state  of 
each  embedded  segment  at  the  time  of  fusion  and  let  future  optimizations  be 
applied  only  to  the  covering  segment. 
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The  result  of  fusion  is  a  machine  language  segment  that  Is  formed  by 
combining  the  machine  language  for  each  of  the  segment’s  basic  blocks.  The 
machine  language  segment  is  to  be  consicered  a  basic  unit  with  respect  to 
execution,  i.e.,  transfer  of  control  between  basic  blocks  within  the  segment 
should  not  be  processed  by  the  supervisor.  To  accomplish  this,  branches 
which  terminate  a  basic  block  must  be  treated  differently  depending  on 
whether  they  are  internal  or  external  to  the  segment.  An  internal  branch  is 
a  branch  in  which  the  basic  block  being  branched  to  (destination)  is  in  the 
same  segment  as  the  basic  block  containing  the  branch  (source),  and  the 
destination  is  not  the  segment  entry  block;  otherwise  it  is  an  external  branch. 

If  a  branch  is  internal  to  the  segment,  then  it  can  be  eliminated  if  it  is 
to  an  immediate  successor;  otherwise  it  can  be  performed  directly.  If  the 
branch  is  external  to  the  segment,  it  must  go  through  the  supervisor.  Once 
the  segment  becomes  totally  optimized,  any  branches  to  its  entry  block  can 
be  made  directly. 

When  forming  the  physical  segment,  it  is  not  clear  whether  to  perform 
homogeneous  or  non-homogeneous  fusion.  We  will  defer  this  discussion  until 
the  next  chapter  where  we  will  present  empirical  evidence  as  to  the  merits 
of  each.  There  is,  however,  a  general  observation  that  we  can  make.  When 
a  segment  is  formed  by  fusion,  the  segments  that  will  be  included  in  it  (if 
any)  will  have  reached  a  high  optimization  state,  if  not  the  highest.  This 
follows  from  the  fact  that  embedded  segments  execute  at  a  greater  frequency 
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than  their  covering  segment.  Therefore,  fusion  that  produces  a  homogeneous 
segment  may  tend  to  apply  too  much  optimization  too  soon,  while  producing  a 
non-ho  nogeneous  segment  causes  the  optimization  of  the  segment  to  be  more 
gradual. 

Fusion  is  thus  seen  to  be  an  important  optimization,  for  it  determines 
the  highest  optimization  state  attainable  for  the  segments  involved,  and  thus 
has  a  strong  influence  on  a  program’s  performance.  How  to  incorporate 
fusion  into  the  overall  optimization  scheme  is  another  of  the  basic  problems 
in  controlling  dynamic  optimization.  We  now  present  two  dynamic  optimization 
schemes  based  on  different  metrics  that  treat  fusion  differently. 

2.4  Iterative  Dynamic  Optimization 

The  first  dynamic  optimization  scheme  is  based  on  a  cost  metric,  the 
total  run-time  cost  associated  with  executing  a  program  which  is  being 
adapted.  This  cost  consists  of  the  execution  cost  plus  the  optimization  cost, 
and  is  a  function  of  time  and  storage  space.  It  includes  optimization  costs  in 
order  to  guarantee  that  optimization  will  be  gradual  and  performed  when  it 
pays  to  do  so.  Informally,  we  want  to  minimize  the  total  run-time  cost  for  a 
program.  This  means  interrupting  the  program’s  execution  periodically  and,  by 
Knowing  its  past  behavior,  determining  how  it  should  have  been  optimized  so 
that  the  total  run-time  cost  would  have  been  less  than  it  actually  was. 
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The  scheme  considers  only  the  optimization  of  segments  (which  are 
determined  prior  to  the  start  of  execution),  not  of  basic  blocks.  It  is  a 
non-homogeneous  optimization  scheme  in  which  the  rate  of  optimization  for 
segments  is  free  to  vary,  i.e.,  there  is  no  restriction  on  the  number  of 
segments  that  can  be  optimized  at  one  time,  or  the  number  of  optimizations 
that  can  be  applied  to  each  segment.  For  the  latter,  we  make  the  restriction 
that  there  be  no  backtracking,  i.e.,  once  a  segment  is  optimized,  it  cannot  be 
"deoptimized"  back  to  what  it  was  previously.  The  goal  is  to  find  the 
combination  of  optimizations  that  minimize  the  cost  metric.  This  approach  is  a 
natural  way  to  proceed,  and  has  the  advantage  th'it  fusion  is  not  an  issue 
(and  therefore  simplifies  the  formulation). 

2.4.1  A  Mathematical  Model  for  Segment  Optimization 

Let 

/HA j),  j-l,2r..,N  . 

be  the  set  of  segments  for  the  directed  control  flow  graph  of  a  program  P. 
Suppose  there  exists  an  ordered  set  of  separate  and  distinct  code 
optimizations 

{Oj }»  j"l,2,...,m 

These  optimizations  are  known  as  singular  optimizations,  and  are  ordered  in 
the  sense  of  applicability,  i.e.,  Oi  must  be  applied  before  O2,  O2  before  O3, 
etc..  The  composite  optimization  Ojj(i<j)  is  the  transformation  0j...0j  which  can 
be  applied  to  a  segment  only  if  the  transformation  Oj-i...Oi  has  already  been 
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applied. 

The  result  of  the  application  of  one  or  mc'e  optimizations  to  a  segment 
A  is  called  a  representation,  R(A),  of  the  segment.  For  a  segment  Aj(/1,  the 
only  possible  ordered  representations  that  it  may  attain  are: 

Ri(A,)  o  OiAj 

R2(Aj)  -  02(0iAj)  -  0i2Aj 

Rm(A,)  "  0m(-02(0iAj))...)  ■  OimAj 

R,  is  said  to  have  a  higher  optimization  state  than  Rj  if  i>j. 

The  current  representation  of  a  segment  is  the  result  of  the  application 
of  the  composite  optimization  0i>,  for  some  j.  If  the  current  representation 
for  segment  Aj  is  Rj(A,),  then  for  additional  optimizations  the  segment  is 
constrained  to  take  on  a  new  representation  Rk(Aj),  where  j<ksm.  Not  all 
new  representations  are  possible.  A  feasible  representation  is  one  which 
does  not  violate  the  constraint  that  the  optimizations  are  ordered.  If  one 
segment  is  not  covered  by  another,  then  there  is  no  restriction  on  what  new 
representation  it  can  attain.  However,  if  one  segment  covers  another,  the 
covering  segment  cannot  be  optimized  such  that  its  otimization  state  is 
greater  than  that  of  the  embedded  segment.  The  new  representation  for  the 
segment  defines  the  optimizations  that  must  be  applied.  That  is,  since  the 
composite  optimization  Ou  has  already  been  applied,  it  is  only  necessary  to 
apply  the  composite  optimization  Ojk  (if  j*k,  than  this  is  the  null  optimization). 
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Consider  a  subset  Ac/1  which  consists  of  n*N  segments  executed  since 
optimization  was  last  performed.  Define  the  n-tuple  R  -  <R(  A* ),..., R(An)>  with 
the  property  ',n?t  each  R(A,)  is  a  feasible,  ordered  representation  of  equal  or 
higher  optimization  state  than  the  current  representation  of  Aj.  The  goal 

now  is  to  find  the  n-tuple  R  that  minimizes  the  cost  metric. 

Let  Cj(R(Aj))  be  the  cosi  associated  with  segment  Aj  being  in 

representation  R(Aj),  i.e., 

C(R(Aj)>  -  Cl j(R(A,)>  +  C2j(R(Aj» 

where  Clj  is  the  cost  of  executing  segment  Aj  in  representation 
R(Aj), 

C2j  is  the  cost  of  changing  segment  Aj’s  current 

representation  to  R(Aj). 

The  exact  forms  of  Cl  and  C2  depend  on  how  computer  resources  are 
accounter  for,  but  in  general  they  are  a  function  of  execution  time,  T,  and 

core  space,  S.  As  a  concrete  example,  consider  the  case  where  a  user  is 
charged  for  how  much  processor  time  he  uses  and  (just)  the  core  he  uses. 
The  cost  associated  with  that  part  of  core  which  is  fixed  is  a  constant  that 
con  be  ignored.  This  fixed  storage  includes  the  run-time  support  package, 
data  storage  (since  we  are  only  considering  code  optimizations),  and  the 

interpretert.  In  order  to  simplify  the  model,  we  ignore  the  time  required  to 
allocate  and  release  core  and  to  perform  overlays.  The  form  of  Cl  is  then 

n 

Clj(R(Aj))  -  Ki  *  Tj(R(Aj))  +  K2  *  Sj(R(Aj))  *  ^  Tk(R(Ak)) 

r\«l 
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where  Kj  is  the  cost  for  processor  time, 

K2  is  the  cost  of  core  storage  used  per  unit  of  time, 

T,(R(A,))  is  the  time  expended  executing  segment  Aj  in 
representation  R(A,), 

S,(R(Aj))  is  the  amount  of  core  needed  to  store  tha 
representation  R<A()  of  segment  Aj. 

Clj  is  that  fraction  of  the  total  cost  for  segment  Aj,  i.e.,  the  sun.  of  its 
processor  cost  plus  its  core  storage  cost  (the  summation  term  represents 
total  execution  time). 

The  form  of  C2  is  similar  to  that  for  Cl,  but  now  we  need  to  Know 
the  time  to  perform  the  transformation  and  the  space  occupied  by  each 
optimizer.  These  optimizer-dependent  parameters  are  easily  obtained  once 
the  optimizers  are  programmed.  Suppose  for  each  optimization,  Oj,  there 
exists  a  function  Ej(q)  which  gives  the  execution  time  to  perform  the 
optimization  on  a  section  of  code  consisting  of  q  basic  units,  where  a  basic 
unit  is  related  to  the  internal  form  and  may  be  the  number  of  nodes  in  the 
tree,  the  number  of  tuples,  etc..  Let  S(0j)  be  the  amount  of  core  needed  to 
store  the  optimizer  that  performs  optimization  Oj.  If  segment  Aj  consists  of 
qj  basic  units,  then 

C2j(R(Aj»  **  C3,(0j) 

t  The  interpreter  is  always  assumed  to  be  in  core  because  it  simplifies  the 
formulation  and  it  is  highly  likely  that  there  will  be  at  least  some  part  of  the 
program  to  be  interpreted  (as  evidenced  from  the  empirical  result  on  program 
behavior). 
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where  C3j(0j)  is  the  cost  to  perform  opt  miration  Oj.  If  0 j  is  the  singular 
optimization  Oj,  then 

C3j(Oj)  -  Ki  *  Ej(qj)  +  K2  *  S(Oj)  *  Ej(q;)  . 

If  Oj  is  the  composite  optimization  Oy,  then 

j  j 

C3;(0ij>  -  Ki  *  2  *  Ek(qj)  +  K2  *  2  «k  *  S(Ok)  *  Ek(qj> 

k-1  k-1 

where  6k  -  0  if  Ok  has  already  been  applied  to  Aj;  otherwise  1.  That  is,  if 
the  current  representation  of  segment  Aj  is  Rv(Aj),  then  the  cost  associated 
with  the  composite  optimization  0iv  is  zero,  and  C3j(Oij)  is  just  the  cost  of 
performing  the  composite  optimization  0vj,  j>v.  These  equations  assume  the 
optimizers  reside  in  core  only  while  they  are  needed,  i.e.,  that  they  are 
overlayed. 

The  total  cost  associated  with  a  program  in  which  segment  Aj  is  in 
representation  R(Aj)  is 

n 

C  -  2  Cj(R(Aj))  (1) 

i-1 

The  objective  is  to  find  an  n-tuple,  R,  of  representations  such  that  C  is  a 
minimum,  i.e.,  solve 

min  C  (2) 

R 

subject  to  the  constraints: 

Tj(R(Aj))  >  0  (3) 

n 

2  Sj(R(Aj))  S  S 
i-1 


(4) 
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where  S  is  the  tola!  amount  of  available  core  storage.  Constraint  (4) 
requires  that  the  new  representations,  R,  for  the  segments  all  fit  in  core 
simultaneously. 

An  n-tuple  of  representations  that  solves  (2)  subject  to  the  constraints 
(3)  and  (4)  is  known  as  the  optimal  policy  with  respect  to  the  set  A  for 
executing  P.  The  initial  optimal  policy  for  executing  P  is  to  perform  no  code 
optimizations  and  interpret  the  internal  form  produced  by  the  compiler.  Such 
a  policy  is  in  keeping  with  the  philosophy  of  dynamic  optimization  (see 
Section  2.1).  After  P  has  executed  for  a  while,  a  new  optimal  policy  is 
determined  according  to  (2).  This  policy  is  put  into  effect,  and  P  allowed  *o 
continue  execution.  Later  on,  P’s  execution  is  again  interrupted  and  a  new 
optimal  policy  determined.  This  process  is  continued  until  P  terminates. 

2.4.2  Practicality  of  Using  the  Model 

The  iterative  dynamic  optimization  strategy  has  thre°  serious 
disadvantages  which  make  it  impractical  to  use.  First,  it  only  determines 
which  cpgments  to  optimize  and  what  their  new  representations  should  be, 
not  when  to  determine  a  new  optimal  policy.  Second,  there  is  the  problem 
of  obtaining  a  numeric  solution  to  (2).  Any  algorithm  for  solving  (2)  must  be 
such  ,hat  the  total  solution  time  expended  during  a  run  is  a  small  fraction  of 
the  total  execution  time.  To  the  best  of  our  knowledge,  the  only  way  to 
solve  (2)  is  by  a  combinatorial  search,  which  tends  to  be  time  consuming. 
Third,  there  is  the  inability  to  generate  the  required  data.  In  optimizing  the 
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cost,  it  is  necessary  to  determine  the  execution  time,  Tj(R(Aj)),  and  space 
requirements,  S,(R(A,)),  for  the  new  representation  R(Ai)  of  segment  A*.  Since 
the  effect  of  an  optimization  on  a  segment  cannot  be  ascertained,  without 
actually  performing  the  optimization,  Tj  and  S|  have  to  be  predicted.  This  is 
undesirable  because  a  policy  so  determined  is  only  as  good  as  the 
predictions.  However,  if  the  program’s  behavior  is  Known  a  priori,  it  is 
possible  to  solve  (2)  and  obtain  an  absolute  measure  of  performance  (see 
Chapter  5). 

The  model  is  therefore  of  more  use  in  determining  a  standard  against 
which  other  schema  can  be  compared  than  being  used  in  practive.  We  now 
present  a  more  practcal  appioach  in  which  the  rate  of  optimization  is  more 
gradual  than  for  the  iterative  dynamic  optimization  scheme. 

2.5  Incremental  Dynamic  Optimization 

The  incremental  dynamic  optimization  scheme  is  based  on  the  assumption 
that  the  total  execution  time  for  a  basic  blocK/segment  is  proportional  to  the 
frequency  with  which  it  is  executed.  This  assumption  allows  a  frequency 
count  to  be  used  as  a  metric  for  deciding  not  only  which  basic  block/segment 
to  optimize,  but  when  to  apply  optimization.  Each  time  the  basic  block  or 
segment  is  executed,  this  count  is  incrementedt.  When  it  exceeds  a 
predetermined  threshold,  the  basic  block  or  segment  is  advanced  to  the  next 


t  In  practice,  the  count  is  decremented  until  it  becomes  negative. 


representation  by  the  application  of  the  next  optimization.  Therefore, 
optimization  is  applied  incrementally,  i.e.,  one  optimization  at  a  time  to  one 
basic  blocK/segment  at  a  time.  Fusion  is  automatically  handled  by  this 
scheme  since  it  is  just  one  of  the  possible  optimizations. 

Define  the  optimization  count  for  an  optimization  to  represent  the 
number  of  times  a  basic  block  or  segment  is  to  be  executed  in  its  current 
representation  before  applying  the  optimization  (this  is  the  threshold  alluded 
to  above).  The  optimization  count  associated  with  a  basic  block  or  segment 
must  have  the  properties  that  it  is  proportional  to  the  basic  block/segment’s 
execution  time,  and  it  determines  the  proper  time  at  which  to  optimize  the 
basic  block/segment.  Therefore,  an  optimization  count  will  not  be  the  same 
for  each  basic  block/segment.  Instead,  it  will  be  some  function  of  the  basic 
block/segment’s  characteristics,  such  as  the  length  of  the  basic  block/segment 
measured  in  some  appropriate  units,  the  basic  block/segment’s  level  of  nest  a 
in  a  loop  structure,  or  the  amount  of  effort  required  to  apply  the  next 
optimization. 

The  optimization  counts  will  be  determined  empirically.  First,  they  will 
be  estimated  and  treated  as  constants,  then  an  empirical  study  made  to 
determine  what  function  of  the  basic  block/segment’s  characteristics  is  most 
appropriate. 

As  an  example  of  a  possible  function  to  study  empirically,  consider  the 
following  method  for  deriving  optimization  counts.  Assume  time  is  a 
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measurement  of  effort,  and  a  basic  block/segment  consists  of  q  basic  units, 
where  a  basic  unit  depends  on  the  internal  form,  e.g.,  for  trees  a  basic  unit 
is  a  node,  for  n-tuples,  an  individual  n-tuple,  etc..  Suppose  for  each 
optimization,  Oj,  there  exists  a  function  Ej(q),  which  represents  the  time  to 
perform  the  optimization  on  a  basic  block  or  segment  consisting  of  q  basic 
units,  and  tj|<  is  the  time  to  execute  basic  block/segment  Aj  once  in  the 
ordered  representation  R(<.  Then  an  estimate  of  the  optimization  count,  njj, 
for  basic  block/segment  Aj  in  the  ordered  representation  Rj  is 

n jj  “  Ej(qj)/tj,i_i  ,  i-l,2,...,m 

where  m  is  the  total  number  of  distinct  optimizations,  and  tje  is  the  time  to 
interpret  the  basic  block/segment  once,  n^  is  the  number  of  times  the  basic 
block/segment  Aj  can  be  executed  in  representation  Rj_i  before  its  total 
cummulative  execution  time  is  the  same  as  the  time  it  would  take  to  perform 
the  next  optimization  Oj.  njj  represents  an  upper  bound  because  it  would  be 
wasteful  to  spend  more  time  executing  the  basic  block/segment  than  it  would 
take  to  optimize  it.  Therefore  the  actual  optimization  count  used  should  be 
some  fraction  of  njj. 

The  quantity  tjx  must  be  estimated.  When  the  basic  block/segment  is 
first  executed,  let  the  supervisor  clock  its  execution.  This  measurement  is 
exact  for  a  basic  block  because  its  execution  is  sequential.  But  for  a 
segment  it  is  an  approximation  since  segments  contain  loops  and  internal 
branching.  Therefore,  we  can  assume  the  segment’s  timing  to  be  exact  only 
if  we  assume  its  future  behavior  will  be  the  same  as  its  past  behavior. 


Knowing  t jk»  the  supervisor  can  now  calculate  the  basic  blocK/segment  s 
optimization  count. 

2.6  The  Segment  Driver 

Optimization  and  execution  of  a  program  are  under  the  control  of  the 
segment  driver.  Execution  of  a  program  proceeds  one  basic  block  or  segment 
at  a  time.  At  the  start  of  execution,  the  segment  driver  is  called  with  a 
parameter  indicating  which  basic  block  or  segment  to  execute.  Before 
executing  c  basic  block/segment,  the  segment  driver  decides  whether  or  not 
to  optimize.  If  optimization  is  to  be  performed,  it  decides  which  basic 

blocks/segments  to  optimize  and  how  much,  and  calls  the  appropriate 

optimizers.  Then  it  executes  the  basic  block/segment.  If  the  executable  code 
is  interpretive,  a  subroutine  call  is  made  on  the  interpreter}  otherwise  a 

subroutine  call  is  made  on  the  machine  language  representation  of  the  basic 
block  or  segment. 

During  the  execution  of  the  code,  there  may  be  a  call  on  a 
subprogram;  these  calls  may  be  nested.  To  execute  the  subprogram,  the 
segment  driver  is  called  recursively.  Execution  of  the  subprogram  proceeds 
as  just  described,  but  any  calls  on  the  interpreter  must  be  recursive,  for  the 
interpreter  may  have  made  the  subprogram  call. 

Execution  of  the  basic  block  or  segment  is  terminated  by  a  branch 
instruction  to  another  basic  block  or  segment.  If  the  branch  occurs  in  a 


35 


basic  block  or  is  external  to  a  segment,  control  is  returned  to  the  segment 
driver  by  a  subroutine  return  which  passes  back  the  destination  of  the 
branch  instruction.  Branches  internal  to  a  segment  are  performed  directly, 
while  a  return  from  a  subprogram  causes  an  exit  from  the  segment  driver. 

This  entire  process,  depicted  in  Figure  2.1,  is  repeated  until  an 
instruction  that  terminates  the  program  is  executed 


The  system  we  implemented  and  will  describe  employs  the  incremental 
dynamic  optimization  strategy.  The  segment  driver  for  such  a  system 
operates  as  just  described,  except  now  the  optimization  count  determines 
when  to  optimize. 


When  a  basic  block/segment  is  to  be  executed,  the  segment  driver 
decrements  its  associated  optimization  count.  If  the  result  is  negative,  the 
next  optimization  in  sequence  is  performed)  this  calculates  a  new  optimization 
count  for  the  basic  block/segment.  Then  the  basic  block/segment  is  executed 
as  previously  described.  The  modified  flowchart  of  the  segment  driver  is 
given  in  Figure  2.2. 


Basic  block/segment 
to  execute 


Figure  2.1:  The  Execution  Cycle  for  a  General  Segment  Driver 
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Chape  r  III 

The  Adaptive  FORTRAN  System 

Whereas  the  previous  chapter  was  concerned  with  adaptive  systems  in 
general,  this  chapter  will  describe  a  particular  adaptive  FORTRAN  system;  this 
system  was  implemented  and  its  performance  has  been  measured. 
FORTRAN-IV  was  selected  as  the  source  language  because:  1)  it  is  one  of  the 
most  widely  used  programming  languages  and  hence  is  a  rich  source  of 
example  programs  and  comparisons  with  existing  system.-;  2)  it  contains 
enough  interesting  constructs  to  give  credibility  to  the  va  idation  results;  and 
3)  many  of  the  compile-time  optimization  algorithms  currently  in  use  were 
deve'oped  for  FORTRAN  compilers;  they  are  well  understood  and  are  easily 
adapted  for  use  at  run-time. 

The  Adaptive  FORTRAN  system  is  based  on  the  incremental  dynamic 
optimization  scheme  described  in  the  previous  chapter  (see  Section  2.5).  It 
employs  four  basic  optimizations  (constant  folding,  fusion,  common 
subexpression  elimination,  and  code  motion),  and  has  two  generators  for 
translating  the  internal  representation  of  the  source  code  (quadruples)  to 
machine  language.  The  chapter  is  divided  into  two  major  sections.  The  first 
sectioi  will  describe  the  organization  of  the  system  (i.e.,  the  different  system 
modules  and  the  function  of  each),  design  criteria  and  implementation  details. 
The  bulk  of  this  section  may  be  bypassed  on  a  first  reading  without  loss  of 
continuity.  However,  it  is  suggested  that  the  introduction  to  Section  3.1.2  on 
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the  system’s  structure  be  read  and  Figure  3.1  be  looked  at.  The  second 
section  is  important,  for  it  describes  the  final  system  and  how  it  was  arrived 
at  through  an  evolutionary  chain  of  systems.  The  latter  discussion  also 
includes  a  presentation  of  the  final  system's  optimization  states  and  their 
associated  optimization  counts.  We  will  defer  a  discussion  on  the  performance 
of  the  system  until  the  next  chapter. 

3.1  The  System’s  Design  and  Implementation  Specifications 

So  that  the  Adpative  Fortran  system  may  be  clearly  understood  and 
duplicated,  a  detailed  description  of  its  design  and  implementation  is 
presented. 

3.1.1  The  Adaptive  FORTRAN  Language 

In  order  to  demonstrate  that  our  technique  is  workable  and  valid,  it  is 
not  necessary  to  strictly  adhere  to  the  formal  definition  of  FORTRAN-IV  or  to 
implement  the  entire  language.  We  assume  the  reader  is  familiar  with 
FORTRAN-IV.  Instead  of  describing  the  complete  subset,  we  therefore  list  all 
the  features  in  FORTRAN-IV  that  were  altered,  extended  or  deleted.  The 
following  extensions  and  alterations  were  made: 

1)  allow  an  arbitrary  number  of  dimensions  for  arrays, 

2)  allow  multiple  assignment  statements, 

3)  allow  the  use  of  real  as  well  as  integer  control 
variables  in  DO  statements, 
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4)  allow  the  use  of  parameters  as  initial,  incrementation, 
and  terminal  values  in  DO  statements, 

5)  allow  the  use  of  negative  increments  in  DO  statements, 

6)  allow  the  use  of  expressions  in  output  lists, 

7)  perform  automatic  conversion  of  real  to  integer  type 
for  subscripts,  in  relational  expressions  and  in  DO 
statements, 

8)  include  exclusive  OR  and  equivalence  as  logical 

operators. 

The  following  features  were  deleted: 

1)  the  use  of  embedded  blanks  in  identifiers, 

2)  the  use  of  double  precision  and  complex  arithmet'C  as 
types, 

3)  usage  of  the  computed  GO  TO  statement, 

4)  usage  of  the  PAUSE  statement, 

5)  usage  of  auxiliary  and  unformatted  I/O  statements, 

6)  the  use  of  the  DO-implied  specification  in  I/O  lists, 

7)  usage  of  the  DATA  statement, 

8)  usage  of  the  EQUIVALENCE  specification  statement, 

9)  the  use  of  statement  functions, 

10)  the  requirement  that  symbolic  names  which  identify 
statement  types  or  operators  may  not  be  reserved 
words, 

11)  the  ability  to  compile  program  units  separately. 

These  modifications  were  made  because  they  simplified  the  experiments 

without  affecting  their  results. 


3.1.2  Structure  of  tho  System 


The  process  of  running  a  FORTRAN  program  is  broken  down  into  three 
major  phases:  1)  the  compilation  of  FORTRAN  source  code  to  relocatable 
quads;  2)  the  loading  of  the  relocatable  quads  to  absolute  quads;  and  3)  the 
execution  of  the  program.  Execution  of  the  program  is  controlled  by  a 
supervisor  known  as  the  segment  driver  (see  Section  2.6)  which  conditionally 
Invokes  an  optimizer  before  allowing  a  basic  block  or  segment  to  execute. 
Execution  of  a  basic  block  or  segment  is  performed  either  by:  1)  the 
interpreter  which  interprets  the  quads;  or  2)  the  machine  language  equivalent 
of  the  quads,  called  as  a  subroutine.  When  optimization  is  performed,  the 
optimizer  performs  transformations  on  the  quads  and  creates  a  machine 
language  segment  by  calling  appropriate  generators.  The  decision  of  when 
and  what  to  optimize  is  controlled  by  the  optimization  count  and  optimization 
state  associated  with  ti  e  basic  block/segment.  However,  performance  of  the 
system  depends  on  the  optimization  states  selected  and  how  the  associated 
optimization  counts  are  determined. 

The  ability  to  change  these  two  optimization  control  parameters  easily 
and  thereby  produce  different  systems  whose  performance  can  be  studied 
was  a  major  design  criterion  applied  in  the  design  and  implementation  of  *he 
optimizers.  Each  optimizer  is  designed  as  a  self-contained  module  which 

l 

accepts  as  an  input  parameter  the  basic  block/segment  to  be  optimized.  It 
either  deduces  all  the  information  it  needs  to  perform  the  optimization  or 
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obtains  it  from  the  segment  table  (a  common  data  structure  accessible  to  all 
optimizers).  An  optimizer  module  consists  of  two  subroutines:  one  which 
performs  the  optimization  algorithm  and  a  second  which  makes  all  the  control 
decisions  associated  with  the  optimization.  Typical  control  decisions  are: 

performing  possible  setups,  calling  the  optimization  algorithm,  changing  the 
optimization  state  and  optimization  count  for  the  basic  block/segment,  calling 
machine  language  generators,  optionally  outputting  statistics  about  the 

optimization  (e.g.,  processing  time,  number  of  quads  manipulated  or  modified) 
and  performing  any  cleanup  functions.  This  modular  construction  isolates 
those  few  parts  of  the  system  that  must  be  modified  in  order  to  produce  a 
different  experimental  system. 

The  structure  of  the  system  is  shown  in  Figure  3.1. 

The  bulk  of  the  system  was  written  !n  ELISS-10  [Bli71],  a  systems 

programming  language  for  the  DEC  PDP-10.  Those  portions  of  the  system 
written  in  machine  language  were  the  segment  driver  (hand  optimized  to 
minimize  overhead)  and  the  run-time  FORTRAN  support  package  (the 

mathematical  routines,  I/O  package,  etc.)  borrowed  from  the  PDP-10  FORTRAN 
system  with  slight  modifications. 

The  entire  system  is  loaded  at  once  into  approximately  50K  36  bit 
words.  This  is  not  necessary;  the  three  phases  could  be  overlaid  (and  would 
be  in  a  production  quality  system).  Again,  this  does  not  affect  the  validity  of 


the  results. 


3. 1.2.1  The  Compiler 

The  first  phase  in  running  a  FORTRAN  program  is  the  translation  of  the 
FORTRAN  source  text  into  the  internal  form  manipulated  by  the  optimizers. 
The  internal  form  selected  is  a  quadruple,  or  quad  for  short,  which  consists 
of  an  operation,  OP,  two  operands,  A1  and  A2,  and  a  result  temporary,  T.  A 
quad  has  the  form: 

(OP,  Al,  A2,  T). 

The  compiler  is  one  pass  and  compiles  relocatable  quads  directly  to  core.  It 
occupies  approximately  9K  of  core  and  compiles  at  the  rate  of  nearly  9,000 
cards/minute.  Its  structure  is  modeled  after  an  ALGOL  compiler  written  by 
the  author  and  fellow  colleagues  [Hay7i]. 

A  secondary  function  of  the  compiler  is  to  partition  the  program  into 
basic  blocks.  Code  is  compiled  into  a  basic  block  until  the  occurrence  of  one 
of  several  conditions  in  the  source  text,  at  which  time  the  basic  block  is 
terminated  and  another  one  started.  The  conditions  are: 

1)  a  labeled  statement  (except  a  FORMAT  statement), 

2)  a  subroutine  or  function  call  (except  for  a  library 
function  or  output  subroutine  call,  since  they  produce  no 
side  effects,  i.e.,  they  do  not  change  the  value  of  a 
variable), 

3)  a  "call  exit"  (e.g„  STOP,  RETURN)  or  END  statement, 

4)  statement(s)  which  cause  the  generation  of  a 
consecutive  sequence  of  conditional  transfer  operations 
possibly  terminated  by  an  unconditional  transfer  (see 
Appendix  A,  Section  A.2.3,  specifically  the  arithmetic  and 
logical  IF). 
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5)  a  GO  TO  statement, 
or  6)  a  READ  statement. 

During  compilation  each  result  generated  in  a  basic  block  is  associated 
with  a  unique  temporary  location.  This  is  to  facilitate  the  translation  of 
quads  to  machine  language  and  the  optimization  of  the  basic  block.  (Since 
these  are  intermediate  results  pertinent  only  to  the  basic  block  in  which  they 
occur,  a  different  basic  block  may  utilize  the  same  temporary  locations.  See 
Appendix  B,  Section  B.l.) 

3. 1.2.2  The  Loader 

After  all  program  units  have  been  compiled,  the  relocatable  quads  are 
immediately  loaded  by  a  loader  (see  Section  2.1)  if  the  program  contains  no 
errors.  The  loader  occupies  less  than  0.5K  of  core,  and  is  very  fast  (all  the 
relocatable  quads  are  in  core). 

The  primary  function  of  the  loader  is  to  load  the  quads  into  absolute 
core  locations;  this  requires  changing  relative  locations  to  absolute  and  'back 
patching’  address  fields.  Before  the  loading  process  can  commence,  the 
loader  must  first  determine  how  the  program  is  to  be  laid  out  in  core 
memory,  i.e.,  it  must  determine  thn  starting  absolute  address  for  each 
reloca  ion  base  (the  unit  of  storage  into  which  code  is  compiled).  All 
compiled  addresses  are  relative  to  one  of  several  relocation  bases:  sequential 
instruction  storage,  out-of-sequence  instruction  storage,  own  storage, 
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temporary  storage,  non-COMMON  variable  storage,  blank  COMMON  storage, 
labeled  COMMON  storage,  library  storage,  and  segment  table  storage. 

The  second  function  of  the  loader  is  to  build  the  segment  table,  which 
is  crucial  for  the  adaptive  process.  Each  entry  in  this  table  contains  all  tl;o 
information  about  a  basic  block  that  is  needed  by  the  optimizers.  A  single 
entry  in  the  table  consists  of  the  following  information  fields: 

1)  QUADREP:  The  address  of  the  basic  block’s  first  quad. 

Initialization  occurs  at  load  time  (the  compiler 
generates  the  starting  address  of  each  basic 
block  under  the  segment  table  relocation 
base). 

2)  CURREP:  The  absolute  address  of  the  current 

representation  of  the  basic  block.  When  the 
block  is  executed,  this  address  determines 
how  it  is  executed.  The  initial  value  is  the 
address  of  the  quad  interpreter;  when  the 
basic  block’s  quads  are  translated  to  machine 
language,  the  value  is  the  starting  address 
of  the  machine  language. 

3)  SEGNO:  The  segment  number  to  which  the  basic 

block  belongs  when  the  basic  block  is  fused 
into  a  segment.  Initially  it  is  equal  to  the 
basic  block  number.  After  fusion,  it  is  the 
block  number  of  the  segment’s  entry  block. 

Thus,  the  identity  of  an  embedded  segment 
is  lost.  In  the  case  of  non-homogeneous 
fusion,  embedded  segments  are  remembered 
by  saving  on  a  list  the  block  number  of  the 
first  and  last  block  of  the  segment.  This 
list  is  associated  with  the  covering  segment 
by  saving  a  pointer  to  it  in  another  field 
appended  to  the  segment  table. 

4)  OPTCNT:  The  basic  block/segment’s  optimization  count. 

This  field  is  decremented  by  the  segment 
driver  each  time  the  basic  block/segment  is 
executed  by  the  segment  driver.  When  it 
goes  negative,  the  basic  block/segment  is 


optimized  according  to  the  OPTSTATE  field. 


5)  OPTSTATE:  The  optimization  state  of  the  basic 

block/segment.  This  field  determines  which 
optimization  is  to  be  performed  next  on  the 
basic  block/segment  when  the  OPTCNT  field 
goes  negative. 

6)  PREDPT:  A  pointer  to  the  first  item  in  the  linked  list 

of  immediate  predecessors  for  the  basic 
block.  This  list  contains  the  block  number 

of  all  basic  blocks  that  are  immediate 
predecessors  of  the  block  in  increasing 
order. 

7)  LASTPRED:  A  pointer  to  the  last  item  in  the  basic 

block’s  immediate  predecessor  list. 

8)  QB:  The  address  of  the  first  quad  branch 

instruction  in  the  basic  block.  This  field  is 
used  when  it  is  necessary  to  move  the 
machine  language  for  the  basic  block  and 
the  quad  branch  instruct'ons  must 
consequently  be  retranslated  ti.  machine 
language. 

9)  MLB:  The  starting  address  of  the  machine 

language  translation  of  the  quad  branch 
instruction(s)  in  the  basic  block.  When  the 
machine  language  for  a  basic  block  is  moved, 
only  those  machine  language  instructions 
from  CURREP  to  this  address  need  be 
moved. 

10)  AEN'  -<Y:  The  machine  language  address  of  the 

alternate  entry  point  to  the  segment’s  entry 
block.  The  segment’s  invariant  quads  are 
affixed  to  the  start  of  the  segment’s  entry 
block  (see  Section  3. 1.3.3).  When  the 

segment  is  translated  to  machine  language, 
the  CURREP  field  points  to  the  first  machine 
language  instruction  of  the  segmert’s  entry 
block,  i.e.,  to  the  invariant  code.  But  the 
invariant  code  need  only  be  executed  once, 
hence  any  internal  branch  to  the  segment’s 
entry  block  need  only  go  to  the  alternate 
entry  point.  When  the  quads  for  the 
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segment  entry  block  are  translated  to 
machine  language,  the  AENTRY  field  is  set  so 
all  subsequent  quads  of  the  segment  that 
branch  to  the  entry  block  will  be  translated 
to  branch  to  the  address  specified  by  it. 

After  the  program  is  loaded,  the  loader  initializes  the  segment  table. 

The  fields  are  initialized  to  the  following  values: 

1)  CURREP  is  set  to  the  address  of  the  interpreter, 

2)  SEGNO  is  set  to  the  basic  block’s  block  number  which 
is  identical  to  the  entry’s  placement  in  the  segment 
table  (numbers  starting  at  1).  Thus  the  block  number  is 
used  as  an  index  into  the  table. 

3)  AENTRY  is  set  to  zero, 

4)  OPTCNT  is  set  to  a  constant  which  determines  how  long 
the  basic  block  is  to  be  interpreted  (see  Section  3.2), 

5)  OPTSTATE  is  set  to  zero  (see  Section  3.2  for  the 
possible  values  this  field  may  attain  and  their  meanings), 

6)  PREDPT  and  LASTPRED  are  set  as  the  quads  of  each 

basic  block  are  scanned  in  the  generation  of  the 
immediate  predecessor  lists.  The  quads  of  a  basic  block 
are  scanned  backwards,  since  in  order  to  determine 

immediate  predecessors  it  is  necessary  to  examine  only 
the  branch  instructions  which  terminate  the  basic  block. 

7)  QB  is  set  when  the  immediate  predecessors  are  being 

generated,  for  the  first  branch  instruction  in  the  basic 

block  is  the  last  branch  instruction  scanned  (see  (6) 
above). 

After  the  segment  table  has  been  initialized  and  the  immediate 
predecessors  generated,  the  loader  examines  the  loop  structure  of  the 
program.  Based  on  the  loop  structure  it  changes  the  OPTSTATE  and  OPTCNT 
fields  of  certain  basic  blocks.  This  part  of  the  loader  is  dependent  entirely 
on  the  incremental  dynamic  optimization  scheme  employed.  Therefore  we 


defer  discussing  the  details  of  this  loop  structure  analysis  until  Section  3.2. 

The  loader  terminates  by  passing  control  to  the  segment  driver  and 
specifying  to  it  the  tirct  basic  block  in  the  main  program  to  be  executed. 

3. 1.2.3  The  Execution  Phase 

Execution  of  the  program  is  controlled  by  the  segment  driver 
(see  Section  2.6).  The  main  loop  of  the  segment  driver  consists  of  two 
machine  language  instructions:  one  decrements  the  OPTCNT  field  for  the  basic 
block/segment  being  executed  and  tests  if  the  count  has  gone  negative;  the 
other  calls  the  interpreter  or  machine  language  segment  as  a  subroutine.  If 
the  optimization  count  goes  negative,  the  basic  block/segment  is  optimized 
according  to  the  OPTSTATE  field  before  being  executed.  Execution  of  the 
basic  block/segment  is  terminated  by  a  branch  instruction  that  transfers 
control  out  ot  the  basic  block/segment.  The  branch  behaves  as  a  subroutine 
return  so  control  is  returned  to  the  segment  driver,  which  executes  the  next 
basic  block/segment  specified  by  the  branch.  Thus  the  overhead  incurred  is 
two  machine  language  instructions  in  the  segment  driver  plus  the  number  of 
instructions  to  effect  the  branch.  If  the  branch  is  being  interpreted  the 
overhead  is  approximately  12  machine  language  instructions;  if  it  is  in  machine 
language,  the  overhead  is  two  instructions. 

It  should  be  pointed  out  that  not  only  does  the  segment  driver  call  the 
interpreter,  but  that  it  is  possible  for  the  interpreter  to  call  the  segment 


driver.  Therefore,  both  routines  must  be  recursive.  The  latter  situtation 
arises  when  the  interpreter  calls  a  subprogram  unit.  The  reasons  for  the 
recursive  call  is  that  the  segment  driver  controls  the  execution  and 
optimization  of  the  program,  i.e.,  execution  and  optimization  proceeds  one 
basic  block/segment  at  a  lime.  Calling  a  subprogram  unit  is  the  only  case  in 
which  the  execution  of  a  basic  block/segment  is  interrupted  while  other  basic 
blocks/segments  are  executed  (and  possibly  optimized).  Centralizing  the 
control  of  execution  and  optimization  in  the  segment  driver  provides  a  clean 
interface  between  the  interpreter,  the  optimizers,  and  the  program  sections  in 
machine  language,  and  enables  the  control  to  be  changed  easily  so  different 
systems  can  be  constructed  and  experimented  with.  The  segment  driver  can 
also  be  directly  called  recursively  if  the  basic  block/segment  is  in  machine 
language  and  contains  a  call  on  a  subprogram  unit.  The  reason  is  the  same 
as  for  the  indirect  recursive  call,  but  now  the  quad  calling  the  subprogram 
has  been  translated  to  equivalent  machine  language,  i.e.,  code  which  is 
identical  to  that  executed  by  the  interpreter.  A  subprogram  return  is  the 
only  branch  out  of  a  basic  block/segment  in  which  control  is  not  to  be 
passed  back  to  the  segment  driver,  but  back  to  the  point  where  the  segment 
driver  was  called  recursively.  To  effect  the  exit  from  the  segment  driver, 
the  subprogram  return  passes  back  a  block  number  of  zero  which  the 
segment  driver  executes.  The  CURREP  field  for  block  zero  in  the  segment 
table  points  to  an  alternate  entry  point  in  the  segment  driver  which  contains 
the  exit  code.  Thus  the  same  control  mechanism  is  used  to  effect  all 
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branches  out  of  a  basic  block/segment. 

Program  execution  thus  consists  of  executing,  via  the  segment  driver, 
one  basic  block/segment  at  a  time  with  optimizations  intermixed.  We  now 
turn  our  attention  to  the  various  optimizations  implemented. 

3.1.3  The  Optimizations 

Adaptive  FORTRAN  uses  four  machine  independent  optimizations:  constant 
folding,  non-homogeneous  fusion,  common  subexpression  elimination  and  code 
motion,  and  a  host  of  machine  dependent  optimizations.  There  are  a  number 
of  reasons  why  these  optimizations  were  selected  over  other  possibilities. 
First,  these  optimizations  are  the  most  commonly  used  ones.  Second,  they 
allow  us  to  construct  systems  similar  in  characteristics  to  existing  compilers 
against  which  it  is  possible  to  compare  the  Adaptive  FORTRAN  system 
(see  Chapter  4).  Third,  to  show  the  flexibility  of  the  system,  we  wanted  to 
include  optimizations  that  applied  both  to  basic  blocks  and  segments.  Finally, 
we  wanted  to  include  enough  optimizations  to  prove  the  technique  was  not 
only  feasible,  but  that  the  system  could  perform  at  least  as  well  as  current 
compiler  systems. 

There  are  two  machine  language  generators  which  apply  various 
machine  dependent  optimizations.  The  first  is  the  "dumb"  code  generator, 
which  performs  straight  forward  translation  of  quads  to  machine  language.  It 
is  used  when  individual  basic  blocks  are  being  optimized.  The  second 
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machine  language  generator  is  the  "fair"  code  generator,  which  is  considerably 
more  sophisticated.  It  utilizes  information  gathered  from  the  translation  of 
previous  quads  and  in  certain  cases  combines  consecutive  quads  in  order  to 
generate  more  efficient  machine  language.  It  is  used  to  generate  machine 
language  for  optimized  segments. 

Optimization  is  either  at  the  basic  block  level  (fusion  and/or  the  "dumb" 
code  generator),  or  the  segment  level  (common  subexpression  elimination  or 
code  motion  in  combination  with  the  "fair"  code  generator).  Regardless  of 
which  is  used,  the  net  effect  is  the  creation  of  machine  language  from  the 
basic  block/segment's  quads.  For  a  segment,  the  machine  language  for  each 
basic  block  must  occupy  consecutive  core  for  execution  purposes.  Therefore, 
it  is  built  piecemeal  by  appending  the  machine  language  for  successive  basic 
blocks  in  the  segment. 

If  an  optimization  has  no  effect  on  a  basic  block  and  the  proper 
machine  language  exists,  all  machine  language  instructions  except  those  for 
the  branches  (which  terminate  the  basic  block)  can  be  moved  because  they 
are  position  independent.  The  branches  must  be  retranslated.  (The 
instructions  which  must  be  moved  can  be  determined  from  (he  CURREP  and 
MQB  fields  for  the  basic  block  in  the  segment  table.  The  QB  field  specifies 
where  the  quads  are  located  for  the  branches  that  must  be  retranslated.) 

If  the  machine  language  for  the  basic  block  does  not  exist,  the  proper 
generator  is  called  and  it  will  compile  the  machine  hnguage  directly  to  the 
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end  of  the  machine  language  segment  being  formed.  Since  the  segment  is 
built  piecemeal,  there  is  a  problem  with  forward  branches  to  blocks  not  yet 
processed.  This  is  handled  by  chaining  the  branch  instructions  together  and 
then  patching  them  when  the  block  is  processed. 

‘fhe  translation  (or  retranslation)  of  quad  branches  is  handled  specially 
in  order  to  minimize  the  overhead  for  inter-block  transfers.  The  problem  is 
determining  the  correct  machine  language  to  be  generated  for  the  branch,  i.e., 
whether  any  should  be  generated  at  all,  and  if  so,  whether  the  machine 
language  should  perform  the  branch  directly  or  go  through  the  segment 
driver.  The  correct  decision  depends  on  whether  the  branch  is  internal  or 
external  to  a  segment  (see  Section  2.3).  For  an  external  branch,  the  machine 
language  goes  through  the  segment  driver  so  the  destination  will  be  optimized 
further.  In  the  case  of  an  internal  branch,  either:  1)  no  machine  language  is 
generated  if  the  branch  is  unconditional  and  the  destination  is  the  next  basic 
block;  or  2)  the  machine  language  performs  the  branch  directly  via  the 
CURREP/AENTRY  field  in  the  segment  table  because  optimization  of  the 
destination  is  controlled  by  its  segment  entry  block.  After  the  final 
optimization  has  been  performed  on  a  segment,  a  branch  in  one  of  its  basic 

blocks  to  the  entry  block  is  considered  to  be  internal  so  it  will  be  performed 
directly. 

Whether  the  branch  is  external,  internal  via  CURREP  or  internal  via 
AENTRY  is  encoded  in  the  quad  (see  Appendix  A,  Section  A.3,  specifically  the 
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BTY  tag).  The  current  value  of  the  tag  aids  in  determining  the  correct 
machine  language  to  be  generated  and  saves  having  to  regenerate  the 
information.  It  is  updated  whenever  the  branch  is  translated  or  retranslated 
to  machine  language  in  order  to  reflect  the  (possible)  change  in  status  of  its 
containing  basic  block  brought  about  by  the  application  of  an  optimization. 

We  turn  nov,  to  a  brief  description  of  each  optimizer  in  order  to  give 
a  clear  understanding  of  how  they  work  (and  their  limitations). 

3. 1.3.1  Fusion 

When  a  basic  block  has  been  executed  enough  times,  it  is  fused  into  a 
segment  having  the  properties  given  in  Section  2.2.  The  fusion  process 
consists  of  two  parts:  the  logical  determination  of  the  segment  containing  the 
basic  block  and  the  physical  creation  of  the  machine  language  segment. 

The  logical  segment  is  determined  by  the  fusion  flgorithm  which  utilizes 
the  immediate  predecessor  lists  and  the  SEGNO  field  in  the  segment  table  (for 
bypassing  the  examination  of  immediate  predecessor  lists  of  basic  blocks 
already  fused  into  a  segment).  As  a  consequence  of  the  algorithm,  a  segment 
consists  of  a  set  of  consecutively  numbered  blocks,  i.e.,  a  segment  is  a 
contiguous  section  of  the  segment  table.  After  the  segment  is  formed,  the 
SEGNO  fields  of  all  basic  blocks  in  the  segment  are  changed  to  be  the  block 
number  of  the  segment  entry  block. 
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The  physical  machine  language  segment  is  created  by  the  control 
section  of  the  fusion  module.  Adaptive  FORTRAN  uses  non-homogeneous 
fusion.  If  the  machine  language  for  a  basic  block  already  exists,  it  is  used; 
otherwise  the  basic  block’s  quads  are  translated  to  "dumb  code. 

Finally,  the  fusion  optimizer  determines  the  new  optimization  state  and 
optimization  count  for  the  new  segment  (see  Section  3.2  for  the  precise 
values  used  and  how  the  optimization  count  is  determined). 

3. 1.3.2  Common  Subexpression  Elimination  (CSE) 

The  CSE  optimizer  eliminates  common  subexpressions  from  a  basic  block. 
The  optimizer  is  not  applied  to  the  segment  taken  as  a  whole,  but  to  each 
basic  block  contained  in  the  segment  whose  optimization  state  indicates  CSE 
has  not  yet  been  performed  (embedded  segments  may  already  have  had  CSE 
performed  on  them). 

The  optimization  is  performed  on  the  quad  representation  of  the  basic 
block.  All  modifications  are  made  directly  to  the  quads;  temporary  locations 
may  therefore  be  used  more  than  once  (in  the  original  compiled  code  each 
result  of  a  basic  block  was  assigned  a  unique  temporary)  and  no-operation 
(NOP)  instructions  placed  where  common  subexpressions  have  been  eliminated. 

The  CSE  algorithm  makes  two  passes  over  the  basic  block’s  quads.  The 
prepass  searches  for  replacement  operations  on  simple  vu.'iables  and,  using 


this  information,  determines  the  limit  for  each  quad,  i.e.,  the  first  quad  which 
changes  the  value  of  one  cf  its  arguments.  The  limit  of  a  quad  puts  a 
bound  on  the  quads  that  must  be  searched  when  searching  for  a  common 
subexpression. 

The  second  pass  over  the  quads  searches  for  common  subexpressions, 
i.e.,  for  two  quads  that  have  identical  operation  codes  and  input  arguments. 
This  search  is  accomplished  by  scanning  forward  to  the  limit  of  the  quad.  If 
an  identical  quad  is  found,  it  is  replaced  by  a  NOP  and  the  usage  of  the 
result  temporary  for  the  NOP’ed  quad  is  searched  for  (it  must  occur  in  a 
quad  that  occurs  after  the  NOP’ed  quad  but  before  the  limit  of  the  quad)  and 
changed  to  be  the  result  temporary  of  the  identical  quad. 

Since  the  optimizer  ,ias  already  collected  information  on  the  location  of 
each  quad  involving  a  replacement  operation,  these  quads  are  searched  for 
pairs  from  which  the  intermediate  temporary  can  be  eliminated,  i.e.,  for  quad 
sequences  of  the  form: 

(0P,V,E,T)  or  (0P,E,V,T) 

which  can  be  collapsed  to: 

<0P,V,E,V)  or  (0P,E,V,V) 

where  OP  is  a  binary  or  unary  operator,  V  is  a  simple  variable,  E  is  a  result 
temporary  or  simple  variable  and  T  is  a  result  temporary.  This  collapsing 
enables  the  machine  language  generators  to  produce  more  efficient  code,  and 
saves  them  from  having  to  regenerate  the  same  information  in  order  to 
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perform  the  collapsing  themselves. 

Since  each  basic  block  of  the  segment  is  processed  separately,  the 
machine  language  segment  is  generated  simultaneously.  After  CSE  is 
performed  on  the  basic  block  its  quads  are  translated  to  machine  language 
using  the  "fair"  code  generator.  If  the  optimization  state  of  the  basic  block 
inoicates  CSE  has  already  been  performed,  then  the  "fair"  code  already  exists 
and  it  is  simply  moved  in  a  manner  identical  to  that  previously  described. 

The  entire  process  is  controlled  by  the  control  section  of  the  module 
which  also  determines  the  new  optimization  state  for  each  basic  block  and 
the  new  optimization  count  for  the  segment. 

3. 1.3.3  Code  Motion  (CM) 

Code  motion  eliminates  invariant  quads  in  a  segment.  A  quad  is 
invariant  if  the  arguments  of  its  operation  are  invariant  within  the  segment. 
Invariant  quads  are  replaced  by  a  NOP  and  are  collected  together  in  a  new 
basic  block  called  the  invariant  code  block.  This  block  is  logically  appended 
to  the  segment’s  entry  block.  It  is  not  physically  appended  to  the  entry 
block  for  implementation  reasons:  1)  certain  optimizations  assume  (for 
efficiency  purposes)  that  the  quads  for  each  basic  block  occupy  contiguous 
memory  locations,  and  to  append  the  invariant  quads  would  require  moving 
quads  to  make  room  and  updating  the  segment  table;  and  2)  it  provides  a 
cleaner  solution  to  the  problem  of  how  to  translate  to  machine  language 
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internal  branches  to  the  entry  block,  for  these  branches  should  not  be  to  the 
invariant  code  block. 

To  logically  connect  the  invariant  code  block  with  the  segment  entry 
block,  the  invariant  block  is  terminated  by  a  special  branch  quad  of  the 
form:  (JUMP,EB,QREP,QBR).  EB  is  the  block  number  of  the  entry  block,  QREP 
is  the  QUADREP  field  from  the  segment  table  for  the  entry  block  and  is 
known  as  the  alternate  entry  point  to  the  segment,  and  QBR  is  the  QB  field 
from  the  segment  table  for  the  entry  block.  These  three  pieces  of 
information  constitute  what  is  needed  to  move  the  entry  block's  machine 
language  or  to  generate  its  machine  language  The  invariant  code  block  is 
made  the  new  segment  entry  block  by  changing  in  the  segment  table  for  the 
old  entry  block: 

1)  the  QUADREP  field  to  the  address  of  the  first  quad  in 
the  invariant  block, 

and  2)  the  QB  field  to  the  address  of  the  special  branch  quad 
which  terminates  the  invariant  code  block. 

See  Appendix  B,  Section  B.l  for  an  example  of  an  invariant  code  block, 
the  machine  language  generated  for  it,  and  the  entry  block  associated  with  it 
(especially  the  code  generated  for  a  branch  to  the  alternate  entry  point). 

The  CM  algorithm  first  makes  sure  CSE  has  been  performed  on  each 
basic  block  in  the  segment  (no  machine  language  is  generated).  Then  in 
order  to  find  the  invariant  quads,  it  makes  two  passes  over  each  basic  block 
in  the  segment.  In  the  first  pass,  it  constructs  a  list  of  all  variables  or 
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indirect  results  that  are  not  invariant.  Using  this  list,  it  then  searches  each 
basic  block  for  invariant  quads;  however  it  processes  only  the  invariant  code 
biock  for  embedded  segments  which  have  already  had  CM  applied  to  them. 

Let  the  quad  being  processed  by  CM  be  of  the  form: 

Ql:  (0P,A1,A2,T1) 

If  the  quad  is  invariant,  i.e.,  its  arguments  A1  and  A2  are  invariant,  then  how 
it  is  processed  depends  on  whether  or  not  it  is  in  an  invariant  code  block 
and  if  it  already  exists  in  the  new  invariant  code  block. 

Suppose  Ql  does  not  already  exist  in  the  new  invariant  block.  If  Ql  is 
not  in  an  invariant  code  block,  then  the  quad  (0P,A1,A2,T3)  is  added  to  the 
new  invariant  code  block,  where  T3  is  a  new  unique  temporary  (using  a  new 
unique  temporary  is  necessary  since  basic  blocks  share  the  same  temporary 
locations).  Then  Ql  is  replaced  by  the  quad  (REPL,T3„T1),  read  Tl«-T3,  if  T1 
must  be  in  memory  (see  Appendix  A,  Section  A.3,  specifically  the  SR  tag); 
otherwise  with  a  NOP.  All  occurrences  of  T1  occurring  after  Ql  irt  the  basir 
block  are  replaced  by  T3.  If  Ql  is  in  an  invariant  code  block,  T1  is  a 
unique  temporary,  so  the  quad  (0P,A1,A2,T1)  is  inserted  into  the  new 
invariant  code  block  and  Ql  replaced  with  a  NOP. 

If  on  the  other  hand  Ql  is  already  in  the  new  invariant  code  block, 
then  it  is  a  common  subexpression  that  is  invariant  in  more  than  one  bas*c 
block  (recall  CSE  is  performed  only  on  individual  basic  blocks  of  a  segment, 
not  on  the  segment  taken  as  a  whole).  Let  the  common  subexpression  in  the 
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new  invariant  code  block  be  of  the  form:  (0P,A1,A2,T2).  Then  if  Q1  is  in  f,n 
invariant  code  block,  T1  is  a  unique  temporary  and  it  suffices  to  insert  the 
quad  (REPL,T2„T1)  in  the  new  invariant  block  and  replace  Q1  with  a  NOP.  If 
Q1  is  not  in  an  invariant  code  block,  then  Q1  is  replaced  by  the  quad 
(REPL,T2„T1)  if  T1  must  be  in  memory;  otherwise  with  a  NOP.  All 
occurrences  of  T1  occurring  after  Q1  in  the  basic  block  are  replaced  by  T2. 

The  net  effect  of  the  algorithm  is  to  cause  quad,  to  "bubble"  to  the 
outermost  segment  (loop)  of  which  they  are  invariant. 

The  control  section  of  the  CM  module  invokes  the  CM  algorithm  and 
then  generates  the  machine  language  segment.  For  those  basic  blocks  in 
which  invariant  code  was  removed  and  for  the  entry  block  to  which  invariant 
code  was  appended,  machine  language  is  regenerated  using  the  "fair”  code 
generator..  The  ("fair")  machine  language  for  the  remaining  basic  blocks 
already  exists  and  is  moved  in  a  manner  identical  to  that  previously 
described. 

As  is  the  case  for  the  other  optimizers,  the  final  function  performed  by 
the  module’s  control  section  is  to  determine  the  new  optimization  state  for 
the  basic  blocks  of  the  segment  and  the  new  optimization  count  for  the 
segment  (see  Section  3.2  for  the  exact  values  used). 
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3. 1.3.4  The  "Dumb"  Cod®  Machine  Language  Generator 

The  "dumb"  code  generator  operates  on  basic  blocks,  and  is  invoked 
when  it  is  no  longer  advantageous  to  interpret  a  basic  block  or  when  a  basic 
block  is  fused  into  a  segment  and  is  still  in  interpretive  code  form.  In 
keeping  with  the  philosophy  of  incremental  dynamic  optimization  (i.e.,  gradual 
optimization  of  a  section  of  code),  it  is  a  fairly  straightforward  translation  of 
quads  to  machine  language,  and  employs  a  trivial  register  allocation  scheme 
and  some  of  the  less  sophisticated  machine  dependent  optimizations. 

The  register  allocation  algorithm  uses  four  working  registers  that  it 
assigns  on  a  round  robin  basis.  When  a  register  is  needed,  the  algorithm 
checks  if  the  register  after  the  last  register  assigned  is  free.  If  not,  it 
generates  code  to  store  the  register  in  its  associated  temporary.  A 
temporary  is  associated  with  a  register  when  it  is  the  result  temporary  of  a 
quad.  Once  the  temporary  is  used  as  an  argument,  it  is  disassociated  from 
the  register  because  each  result  generated  in  a  basic  block  uses  a  unique 
temporary.  Variables  are  not  associated  with  a  register.  For  those 
operations  (e.g.,  int  ger  division)  that  require  two  consecutive  registers,  a 
single  register  is  first  located  in  the  manner  just  described.  Then  if  the  next 
higher  register  is  in  use,  code  is  generated  to  store  it. 

Generation  of  the  machine  language  is  table  driven.  For  each  possible 
quad  op-code,  there  is  a  control  word  which  specifies  what  machine  language 
is  to  be  generated  and  how.  The  control  word  is  broken  into  a  number  of 
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fields:  the  type  of  the  operation,  the  register  specification  of  the  arguments, 
the  register  specification  of  the  result,  whether  the  quad  has  embedded 
machine  language,  the  number  of  machine  language  instructions  to  be 
generated,  a  pointer  to  the  machine  language  instructions,  an  indicator  for 
CSE  and  CM  eligiblity,  and  a  switch  to  differentiate  between  conditional  and 
unconditional  branches.  Encoded  in  the  address  fields  of  the  machine 
language  instructions  are  integers  specifying  which  argument  of  the  quad  to 
use. 


The  generator  does  not  make  a  fine  distinction  between  the  op-codes 
and  therefore  does  not  generate  specialized  code  to  handle  each  situation, 
but  instead  classifies  the  op-codes  into  four  groups.  The  operation  type  field 
in  the  control  word  specifies  which  class  the  op-code  belongs  in:  commutative 
binary,  non-commutative  binary,  unary  and  all  others.  The  quad  is  processed 
according  to  this  operation  type. 


As  a  result  of  this  classification  of  operations,  the  number  of  machine 
dependent  optimizations  that  can  be  performed  is  limited.  These  optimizations 
consist  of: 

1)  the  use  of  "immediate"  instructions  for  literal  constants 
(constants  less  than  18  bits), 

2)  the  use  of  indexing  for  indirect  results, 

and  3)  recognizing  for  a  binary  or  unary  operation  the 
arguments  are  in  a  register  and  utilizing  that  register  in 
forming  the  result. 
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The  translation  of  quad  branches  to  machine  language  is  a  special  case; 
processing  is  as  previously  described.  If  machine  language  is  generated  for 
an  internal  branch,  it  performs  the  branch  directly  through  the  CURREP  field 
in  the  segment  table. 

The  "dumb"  code  generator  occupies  approximately  1.5K  of  core.  It  is 
fairly  fast,  taking  on  the  average  of  550^5  to  process  a  quad.  The  generated 
code  executes  approximately  9  time  faster  than  it  takes  to  interpret  the 
equivalent  quads. 

3. 1.3.5  The  "Fair"  Code  Machine  Language  Generator 

The  "fair"  code  generator  is  applied  to  segments,  one  basic  block  at  a 
time.  It  is  invoked  after  the  CSE  or  CM  optimizer  has  been  applied  to  the 
segment. 

Generation  of  the  machine  language  involves  a  thorough  case  analysis 
of  the  variables  for  each  operation  in  order  that  the  most  appropriate 
POP- 10  instructions  can  be  used.  The  PDP-10  instruction  set  is  quite 
extensive;  most  instructions  have  a  basic  form  plus  a  number  of  variants.  To 
utilize  the  complete  instruction  set  and  therefore  generate  the  ultimate 
machine  language  would  involve  an  unreasonable  amount  of  effort,  certainly 
more  than  necessary  to  validate  our  approach.  Therefore,  the  operations 
were  ranked  according  to  frequency  of  usage  w;th  a  corresponding  detailed 
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The  case  analysis  for  the  binary  and  unary  operators  is  based  on  the 
mode  of  the  arguments  involved.  The  possible  modes  and  a  brief  reason  for 
each  are: 

1)  MEM:  argument  in  memory.  This  mode  handles 

variables  that  are  in  memory  and  results  that 
have  to  be  stored  in  order  to  free  a 

register. 

2)  REG:  argument  in  a  register.  This  mode  is  for 

retaining  variables  across  replacement 
statements  and  intermediate  results. 

3)  NUM:  argument  is  a  number.  This  mode  permits 

the  processing  of  literal  constants  (constants 
less  than  18  bits)  and  constant  folding. 

4)  REG+NUM:  argument  is  the  result  of  adding  the  contents 

of  a  register  to  a  number.  This  mode  delays 
the  generation  of  the  addition  so  that  if  the 
argument  is  used  as  an  indirect  result, 
indexing  can  be  used  (the  NUMber  is  the 
address  field  and  the  REGister  the  indexing 
register). 

Using  the  mode  of  an  argument  as  a  coordinate  label  and  an  argument 
to  label  each  dimension,  a  code  arra/  is  constructed  for  each  binary  and 
unary  operation  (cf.  [Gri 7 1  j).  Each  element  of  the  array  contains  the  code  to 
be  generated  for  that  particular  case.  For  the  binary  operators,  there  are 
16  possible  cases,  while  for  the  unary  operations  there  are  only  four  cases 
corresponding  to  the  four  modes. 

Most  of  the  cases  are  subdivided  into  subcases.  The  correct  subcase 
is  selected  according  to  information  stored  in  either  of  two  data  structures: 
the  temp  table  or  the  register  table.  There  is  one  entry  in  the  temp  table 
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for  each  temporary  used  in  the  basic  block;  each  entry  consists  of  six  fields: 

1)  Mode  of  temporary  result: 

a)  MEM:  result  has  been  stored  into  memory 

b)  NUM:  result  is  a  folded  number 

c)  REG:  result  is  in  a  register 

d)  REG+NUM:  result  is  a  register  plus  a 
number 

2)  Register  associated  with  temporary,  i.e.,  the  register  the 
result  occupies. 

3)  Range  of  temporary,  i.e.,  the  address  of  the  last  quad 
that  uses  the  temporary.  When  the  quad  is  processed, 
the  temporary  is  disassociated  from  the  register  it  is  in. 

4)  Neg-bit,  which  indicates  the  negative  of  the  temporary 
is  required.  This  bit  permits  the  generation  of  negation 
instructions  to  be  delayed,  and  therefore  allows  multiple 
negations  to  cancel  one  another  or  special  instructions 
to  be  generated  (e.g.,  load/store  negative,  subtract 
instead  of  add,  etc.). 

5)  Information  field,  which  contains  the  address  of  a 
constant  or  the  value  of  a  folded  constant  or  literal. 

6)  Number  indicator,  which  identifies  the  number  in  the 
information  field;  either: 

a)  the  number  is  not  a  result  of  foldirj  and 
the  information  field  contains  the  address 
of  the  constant, 

b)  the  number  is  the  result  of  folding  and 
the  information  field  contains  the  value  of 
the  constant  (which  is  not  a  literal). 

Whenever  an  instruction  is  generated  that 
uses  tms  constant,  storage  must  be 
assigned  for  it  and  initialized  to  its  value, 

c)  the  information  field  contains  the  value 
of  a  literal  (folded  or  otherwise). 


The  register  table  contains  one  entry  for  each  working  register;  each 


entry  consists  of  eight  fields: 
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1)  Mode  of  the  register: 

a)  register  has  no  associated  temporary. 

b)  register  has  an  associated  temporary 
whose  mode -"REG". 

c)  register  has  an  associated  temporary 

whose  mode -"REG+NUM". 

2)  The  use  of  the  register,  which  indicates  how  many 
temporaries  with  mode-"REG+NUM"  are  associated  with 
the  register. 

3)  The  variable  counter,  which  indicates  how  many 

variables  are  associated  with  the  register.  This  allows 
variables  to  be  retained  in  registers  after  a  replacement 
operation  and  thereby  possibly  avoids  the  generation  of 
a  reduandant  load  instruction. 

4)  The  address  of  the  associated  temporary  with 

mode -"REG". 

5)  Fields  for  specifying  the  address  of  variables  associated 
with  the  register  (there  may  be  up  to  four). 

The  information  contained  in  these  two  data  structures  permits  the 
following  machine  dependent  optimizations: 

1)  constant  folding, 

2)  use  of  special  instructions  to  set  memory  to  0  or  -1, 

3)  use  of  shift  instructions  for  multiplication  or  division  by 
powers  of  2, 

4)  delaying  negation  operators  to  make  use  of  load/store 
negative  instructions,  permiting  the  usage  of  complement 
instructions  for  an  operation,  or  deleting  successive 
negation  operations, 

5)  use  of  "immediate"  instructions  for  operations  involving 
literal  constants  as  arguments, 


6)  use  of  indexing  for  indirect  results  (subscripting), 
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7)  performing  operations  directly  to  memory,  e.g., 

incrementation  or  decrementation  by  a  literal  constant  or 
for  quads  of  the  form:  (OP,V,E,V)  where  V  is  a  simple 
variable  and  E  is  a  simple  variable  or  result, 

8)  performing  operations  both  to  memory  and  a  register 
simultaneously,  e.g.,  for  quads  of  the  form:  (0P,V,E,V). 

These  optimizations  are  but  a  small  sample  of  the  optimizations  that 
could  be  performed  if  we  were  to  exploit  the  full  instruction  set  of  the 
PDP-10.  They  were  selected  because  they  have  a  high  payoff  for  the  effort 
invested. 


The  operations  were  broken  down  into  three  classes  with  varying 

degrees  of  analysis  applied.  The  most  detailed  analysis  is  performed  on  the 

integer  arithmetic  operators:  binary  and  unary  minus,  since  integer 

arithmetic  is  required  in  frequently  used  language  constructs  (e.g.,  for  counter 

variables  that  control  the  number  of  times  a  loop  is  executed  or  for 

subscript  variables  that  reference  array  elements).  For  the  binary  operators 

there  are  three  code  matrices,  each  designed  to  handle  quads  of  a  specific 

form  (see  Appendix  C  for  the  V  code  matrix).  The  three  forms  are: 

(BIMOP, E1,E2,T) 

(BINOP,V,E,V) 

(BINOP,E,V,V) 

where  BINOP  is  one  of  the  binary  operators  E,  El  and  E2  are  either 

simple  variables,  results  or  indirect  results;  V  is  a  simple  variable;  and  T  is  a 
temporary.  There  are  two  code  vectors  for  unary  minus.  One  handles  quads 
of  the  form  (-,E„T)  while  the  other  is  for  quads  of  the  form  (-,V„V). 
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The  next  class  of  operations  consists  of  the  floating  point  arithmetic 
and  logical  operators.  There  is  one  code  matrix  to  handle  the  binary  float!ng 
point  arithmetic  and  logical  operators.  Parameters  to  the  code  matrix  are  the 
machine  language  instructions  for  the  operator  that  handle  the  different  cases. 
There  are  separate  code  vectors  for  unary  floating  point  negation  and  logical 
not. 


The  final  class  of  operations  includes  all  the  remaining  operations.  The 
entire  analysis  is  performed  by  one  subroutine,  and  the  production  of  the 
machine  language  is  table  driven  as  it  was  for  the  "dumb"  code  generator. 
However,  the  analysis  is  more  involved  due  to  the  different  modes  the 
arguments  may  attain. 

Most  of  the  analysis  is  independent  of  the  operation  being  performed, 
but  there  are  two  types  of  operations  that  require  special  processing.  The 
first  special  case  involves  the  branch  operations.  The  analysis  for 
determining  the  correct  machine  language  is  identical  to  that  previously 
described  except  that  now  there  is  another  case  to  consider  if  CM  was 
applied  to  the  segment.  This  involves  external  branches  to  the  segment 
entry  block.  If  an  invariant  code  block  was  appended  to  the  segment  entry 
b'ock  by  CM,  then  these  external  branches  must  be  to  the  alternate  entry 
point  and  not  to  the  invariant  block.  For  these  external  branches,  the 
machine  language  performs  the  branch  directly  through  the  AENTRY  field  in 
the  segment  table  instead  of  the  CURREP  field  which  is  the  address  of  the 


invariant  block’s  machine  language.  If  CM  did  not  cre-te  an  invariant  block, 
then  external  branches  to  the  entry  block  are  direct  through  the  CURREP 
field,  not  through  the  segment  driver,  since  CM  is  the  last  optimization  that 

can  be  applied. 

The  other  special  case  is  for  relational  operators.  This  is  the  only 
other  case  besides  CSE  in  which  a  sequence  of  quads  is  examined  in  order 
to  produce  more  efficient  machine  language.  The  FORTRAN  construct  being 
optimized  is  the  logical  IF  of  the  form: 

IF(E1  ROP  E2)$ 

where  ROP  is  a  relational  operator  and  S  is  a  statement.  The  quads 

generated  for  this  construct  can  be  found  in  Appendix  A,  Sections  A.2.1  and 

A.2.3,  but  it  is  basically  the  pair: 

(R0P,E1,E2,T) 

(OP,T,  ,) 

that  is  being  combined  to  eliminate  the  intermediate  logical  result  T,  where 
OP  is  either  BF,  BT,  STOPT,  EXTST  or  EXTFT  (see  Appendix  A,  Table  A.l). 

Germane  to  the  generation  of  efficient  optimized  code  is  the  effective 
use  of  the  registers.  Whereas  the  quads  operate  strictly  on  temporaries,  the 
generated  machine  language  instructions  use  registers,  and  it  is  up  to  the 
machine  language  generator  to  control  how  the  registers  are  utilized.  One 
means  of  using  the  registers  effectively  is  for  the  generator  to  remember 
what  variables  and  results  reside  in  which  registers  so  that  those  registers 
can  be  used  to  form  further  results,  thereby  avoiding  redundant  load 
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operations.  Thus,  for  example,  the  fact  a  binary  operator  is  commutative  is 
recognized  so  the  result  is  formed  in  the  register  occupied  by  one  of  the 
arguments  (if  either  argument  is  already  in  a  register)  or  if  a  replacement 

statement  of  a  result  into  a  simple  variable  is  generated,  the  variable  is 

associated  with  the  register  so  it  will  not  be  reloaded  if  used  later. 

The  other  means  for  controlling  the  use  of  registers  resides  in  the 
register  allocation  algorithm,  which  is  invoked  whenever  a  register  is  needed. 
The  algorithm  assigns  the  least  recently  used  of  the  10  working  registers. 
When  a  register  is  needed,  the  registers  (actually  the  register  table)  are 
searched  storting  with  the  last  register  assigned.  First  a  search  for  a 
register  not  in  use  is  made.  If  this  fails  (i.e.,  all  registers  are  in  use),  then 
a  search  for  a  register  with  no  associated  temporary  is  made,  starting  with 

the  first.  This  search  is  effect  vely  for  a  register  that  only  has  variables 

associated  with  it.  If  this  fails,  then  it  is  necessary  to  store  a  register 

containing  a  result.  A  search  is  made  for  a  register  with  an  associated 

temporary  having  the  minimum  number  of  associated  variables.  If  this  fails, 

then  all  registers  have  a  mode  of  "REG+NUM",  so  the  register  with  the 
smallest  number  of  associated  temporaries  is  selected.  Code  is  generated  to 
perform  the  addition  and  store  the  result. 

There  are  cases  (e.g.,  integer  division)  when  two  consecutive  registers 
are  needed.  There  is  another  form  of  the  register  allocation  algorithm  that  is 
identical  to  the  one  just  described,  but  which  searches  for  two  consecutive 
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registers  possessing  the  same  properties.  It  both  registers  do  not  have  the 
same  property,  another  search  is  made  to  find  at  least  one  with  thp  desired 
property.  Only  if  this  fails  is  a  search  continued  for  two  consecutive 
registers  with  another  identical  property. 

The  "fair"  code  generator  requires  approximately  10K  of  core.  It  takes 
on  the  average  twice  as  long  to  process  a  quad  as  the  "dumb"  code 
generator,  i.e.,  approximately  1200/is.  However,  code  generated  for  a  basic 
block  runs  on  the  average  twice  as  fast  as  the  code  generated  by  the 
"dumb"  code  generator. 

3.2  The  System's  Optimization  States  and  The  r  Associated  Optimisation  Counts 

Performance  of  the  system  depends  on:  1)  how  the  fusion  optimizer 
forms  a  machine  language  segment  (homogeneous  versus  non -homogeneous); 
2)  what  optimizations  are  applied  (individually  or  in  combination)  and  in  what 
order;  and  3)  the  optimization  counts.  The  modularity  of  the  system  and  the 
isolation  of  the  code  that  controls  the  behavior  of  the  optimizers  provide  the 
ability  to  change  the  adaptive  strategy  easily  and  thereby  produce 

operationally  different  systems. 

Approximately  15  different  systems  were  constructed  and  tested  before 
the  final  form  was  determined.  As  each  system  was  tested,  more  insight  into 
the  dynamic  optimization  process  was  gained.  The  performance  of  each 
successive  system  was  analyzed  and  this  led  to  experiments  involving 
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variations  in  the  control  functions.  The  final  system  is  a  result  of  this 
evolutionary  process. 

The  first  systems  tested  used  homogeneous  fusion;  all  basic  blocks  in  a 
segment  simultaneously  attain  the  same  optimization  state,  which  is  the 
maximum  optimization  state  of  any  basic  block  contained  in  the  new  segment. 
By  studying  the  performance  curves  it  became  apparent  that  performance  was 
not  satisfactory  for  small  execution  times.  It  was  deduced  that  during  the 
early  stages  of  a  programs  execution,  too  much  optimization  was  being 
applied  too  soon.  The  problem  then  was  to  obtain  satisfactory  performance 
for  small  execution  times  without  degrading  performance  for  medium-to-large 
execution  times. 

The  first  attempt  to  defer  the  optimization  process  was  to  change  the 
optimization  counts  and  keep  the  optimization  states  fixed.  The  optimizations 
and  their  order  or  application  were:  translation  of  basic  blocks  to  "dumb 
code,  homogeneous  fusion,  CSE  and  CM.  This  approach  did  not  prove  to  be 
sufficient  mainly  because  embedded  segments  tend  to  attain  a  high 
optimization  state  and  thereby  cause  the  covering  segment  to  be  optimized 
too  fast. 

The  next  set  of  systems  used  non-homogeneous  fusion  as  described  in 
Section  3. 1.3.1.  The  results  were  better  than  that  achieved  using 
homogeneous  fusion  but  still  not  satisfactory,  for  while  it  improved 
performance  for  small  execution  times,  it  degraded  performance  for  large 
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execution  times.  The  reason  was  that  the  optimization  state  of  embedded 
segments  became  frozen.  To  compensate  for  this,  all  embedded  segments 
were  advanced  to  their  next  optimization  state  before  an  optimization  (CSE  or 
CM)  was  performed.  This  improved  the  performance  for  large  execution 
times,  but  degraded  the  performance  for  medium  execution  times.  Therefore, 
another  approach  was  needed  for  controlling  the  optimization  rate  of 
embedded  segments. 

We  decided  to  perform  a  pre-analysis  on  the  loop  structure  of  the 
program  before  execution  started,  and  to  change  the  optimization  state  and 
optimization  count  of  certain  basic  blocKs  from  their  initialized  values 

according  to  their  depth  of  nesting  in  the  loop  structure.  We  first  attempted 

to  increase  the  optimization  rate  of  innermost  loops  since  they  are  executed 
the  most  frequently  and  ’herefore  should  be  optimized  first.  We  hoped  the 

additional  optimization  time  would  be  negligible  compared  to  the  savings  in 

execution  time.  Only  innermost  loops  consisting  of  two  or  fewer  basic  blocks 
were  considered.  Three  different  ways  in  which  the  initial  optimization  of 
these  innermost  segments  could  be  allowed  to  proceed  were  considered: 

1)  The  translation  of  a  basic  block’s  quads  to  "dumb"  code 
if  the  basic  block  is  executed  more  than  once. 

2)  Fusion  to  "dumb"  code  if  any  basic  block  in  a  segment 
is  executed  more  than  once. 

and  3)  Total  optimization  of  a  segment  if  any  basic  block  in  it 
is  executed  more  than  once. 

The  results  were  encouraging,  with  the  second  of  the  three  approaches  being 
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the  most  promising,  However,  performance  for  medium  execution  times  was 
still  being  degraded  because  the  optimization  rates  of  the  non-innermost 
segmer*s  were  the  same.  Therefore,  the  loop  analyzer  was  modified  to 
recognize  innermost  and  outermost  segments,  thereby  partitioning  the 
segments  into  three  classes,  The  outermost  segment’s  optimization  count  for 
the  last  optimization  state  was  made  smaller  than  that  for  other  embedded 
segments  because  its  execution  rate  is  slower  than  that  for  these  other 
segments  and  therefore  it  should  not  b^  executed  the  same  number  of  times 
before  being  totally  optimized.  This  f.nal  modification  produced  the  most 
favorable  results. 


The  optimization  states  for  the  final  system  on  whose  performance  we 
shall  report  in  the  next  chapter  were  as  follows: 

0:  translate  the  interpretive  code  for  the  basic  block  to 
"dumb"  machine  language. 

1:  perform  a  non-homogeneous  fusion  of  the  basic  block 
into  a  segment.  Basic  blocks  in  interpretive  code  are 
translated  to  "dumb"  machine  language;  blocks  in  machine 
language  are  moved  as  is  with  their  branches 
retranslated. 

2:  perform  code  motion  on  the  segment.  Before  the 
optimization  is  performed,  the  tSE  algorithm  is 
performed  on  all  basic  blocks  in  the  segment.  After  the 
optimization,  the  quads  of  each  basic  block  are 

translated  to  "fair"  machine  language. 

Note  that  CSE  is  not  a  separate  optimization,  but  is  combined  with  CM. 
The  reason  was  that  the  time  to  generate  the  machine  language  segment 
using  the  "fair"  code  generator  is  appreciably  larger  than  the  time  to  perform 
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the  CSE  or  CM  algorithm.  Therefore,  the  combined  time  to  perform  CSE  and 
CM  separately  is  much  greater  than  tre  time  to  perform  the  combination, 
because  the  machine  language  segment  mur*  be  generated  twice. 


The  optimization  counts  associated  with  each  of  the  optimization  states 
depend  on  the  loop  structure  of  the  program.  In  the  loop  classification'  that 
follow,  the  triplet  (C0,C1,C2)  represents  the  optimization  counts  for 
optimization  states  0,  1  and  2  respectively: 

1)  innermost  segments  (i.e.,  loops):  (0,1,50).  Thus 

innermost  segments  are  fused  into  "dumb"  machine 
language  if  executed  once,  then  totally  optimized.  The 
loop  analyzer  initializes  the  optimization  state  and  count 
for  basic  blocks  belonging  to  an  innermost  segment  to 
1. 

2)  outermost  segments:  (6,15,n)  where  n  »  10  if  the 

lengtn  of  the  segment  (in  basic  blocks)  is  £  10, 

otherwise  2*length. 

3)  other  segments:  (6,15,200). 

4)  entire  subprogram:  (6,15,n)  where  n  is  the  same  as  in 

2).  However,  CM  is  not  performed  as  the  last 
optimization;  instead  CSE  is  applied  to  all  the  basic 
blocks  in  the  subprogram.  It  makes  no  sense  to  remove 
invariant  quads  out  of  a  subprogram  because  the  entire 
subprogram  is  always  executed  when  called.  Thus  the 
entire  subprogram  is  considered  a  segment  and 
processed  as  any  other  segment  with  respect  to 
optimization.  What  constitutes  an  outermost  loop  inside 
a  subprogram  depends  on  whether  the  subprogram  is 
called  from  within  a  loop.  It  is  assumed  that  this  is 

always  the  case  for  it  this  assumption  is  not  made, 
experiments  indicate  that  system  performance  is 
degraded. 

Since  the  third  optimization  count  is  determined  prior  to  program 
execution,  it  must  be  saved.  This  is  accomplished  by  appending  another  field 
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to  the  segment  table. 


Finally,  an  explanation  of  how  the  optimization  counts  were  determined 
is  in  order.  The  final  values  given  above  are  based  on  findings  obtained  by 
experimenting  with  a  system  that  employed  homogeneous  fusion. 
Corresponding  to  that  system’s  four  optimization  states,  there  were  four 
optimization  counts:  OP0,  0P1,  0P2,  and  0P3.  Initially,  basic  blocks  and 
segments  were  treated  uniformly.  The  optimization  counts  for  those  in  the 
same  optimization  state  were  assigned  a  constant  value  that  did  not  depend 
on  any  attribute  of  the  basic  block  or  segment.  The  values  selected  and  the 
reasons  were: 

1)  OP0-10  OP0  controls  when  a  basic  block  is  translated 

to  "dumb"  code.  Since  translation  time  is  550*is/quad 
and  interpretation  time  is  approximately  25/ts/quad,  an 
upper  bound  on  the  number  of  times  the  basic  block 
should  be  interpreted  before  being  translated  is 

550/25-22  times.  However,  this  calculation  does  not 
take  into  cone. deration  the  execution  time  of  the  new 
representation.  If  the  basic  block  continues  to  be 
executed,  it  would  pay  to  translate  sooner  because  its 
execution  time  will  be  less.  Thus,  a  fraction  of  22  was 
selected,  viz.,  approximately  1/2. 

2)  0P1  =  15.  Determining  OP1  is  harder,  because  the 

amount  of  time  required  to  perform  the  homogeneous 
fusion  algorithm  is  a  function  of  the  length  of  the 
segment  and  cannot  be  determined  a  priori.  Therefore, 
the  value  choosen  is  based  on  the  fact  that  fusion 
should  be  performed  as  soon  as  possible.  but  not 

before  the  bemfits  of  being  in  "dumb"  code  could  be 

felt,  i.e.,  the  effort  required  to  translate  quads  to 
"dumb"  code  should  not  be  wasted. 

3)  0P2=35.  Since  CSE  produces  code  that  is  at  least 

twice  fast  as  that  produced  by  fusion  to  "dumb" 
code,  a  value  was  choosen  which  is  approximately  twice 
0P1. 


4)  OP3-70.  Because  the  benifits  of  going  from  CSE-"fair" 
code  to  CM-Mfair"  code  are  not  as  great  as  go  ng  from 
fused  "dumb"  code  to  CSE-Mfair"  code,  a  value  was 
choosen  that  could  delay  performing  CM  tor  a 
reasonable  amount  of  time. 

In  an  attempt  to  improve  performance  for  small  execution  times  these 
optimization  counts  were  varied  slightly,  with  no  appreciable  results.  Since 
OP3  was  thought  to  be  the  most  critical  factor,  other  functions  for 
determining  it  were  tried  based  on  the  length  of  the  segment  measured  in 
either  number  of  quads  or  number  of  basic  blocks,  e.g.,  taking  the  natural 
logarithm  or  a  constant  multiple.  The  most  promising  was  taking  a  constant 
multiple  (2)  of  the  length  measured  in  basic  blocks. 

It  became  apparent  that  constant  optimization  counts  were  not  sufficient 
to  significantly  improve  performance.  Further  improvements  were  made  by 
changing  to  non-homogeneous  usion,  combining  CSE  with  CM,  and  not  treating 
segments  uniformly,  but  classifying  them  according  to  their  level  of  nesting  in 
a  loop  structure.  This  necessitated  adjusting  the  optimization  counts 
accordingly.  The  values  choosen  are  given  above;  the  reasons  are: 

1)  innermost  segments:  Cl  =  l  for  there  is  no  reason  to 
perform  any  optimization  if  the  segment  is  not  executed 
at  least  once.  To  totally  optimize  the  segment  after  it 
is  executed  once  results  in  too  much  optimization  being 
applied  too  soon.  Therefore,  total  optimization  is 
delayed,  and  since  CSE  was  combined  with  CM, 
C2-OP1OP2-50. 

2)  outermost  segments  and  entire  subprograms:  C0-OP0«10 
proved  to  be  too  high  a  value,  while  Cl*5  was  minimal. 
Therefore,  C0-6  was  choosen.  Cl«0Pl*15  for  the  same 
reasons  given  for  0P1.  C2  is  a  multip  i  of  ^he  length 
of  the  segment  measured  in  basic  blocks  because  this 
function  was  experimentally  the  most  promising  for 
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determining  when  the  final  optimization  should  be 
performed. 

3)  other  segments:  C2-200  because  OP3  was  considered  to 
be  too  small  for  a  segment  that  is  in  a  loop  structure 
at  least  three  levels  deep.  Based  on  an  analysis  of 
how  many  times  such  a  loop  could  be  executed  in  such 
a  loop  structure,  200  seemed  a  reasonable  choice. 

It  is  unfortunate  that  the  optimization  counts  were  determined 
heuristically  and  a  more  theoretical  basis  was  not  found.  But  the  excellent 
performance  results  presented  in  the  next  chapter  speak  for  themselves. 
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Figure  3.1:  Structural  Organization  of  the  Adaptive  FORTRAN  System 


Chapter  IV 

Validation  and  Experimental  Results 

ir  this  chapter  we  present  the  experimental  evidence  which 
demonstrates  that  dynamic  optimization  is  a  workable  and  valid  technique. 

The  demonstration  strategy  consists  of  implementing  the  Adaptive  FORTRAN 
system  described  in  the  previous  chapter,  and  measuring  its  performance  on 
an  appropriate  program  mix. 

In  order  to  evaluate  Adaptive  FORTRAN’S  performance  measurements,  it 
is  necessary  to  compare  the  results  with  those  obtained  by  running  the  same 
set  of  test  programs  under  other  types  of  FORTRAN  compilers,  viz.,  WATFIV, 
FORTRAN-IV  G  and  FORTRAN-IV  H.  To  do  this  for  various  machines  would 

not,  unfortunately,  provide  a  meaningful  comparison  due  to  the  differences  in 
the  machines  and  their  compilers.  For  the  comparisons  to  be  meaningful,  the 
same  compiler,  optimizers,  machine  language  generators  and  object  machine 
should  be  used. 

Therefore,  the  approach  taken  is  to  transform  the  Adaptive  FORTRAN 
system  into  systems  that  resemble  those  three  real  compilers.  It  is  against 
those  three  compilers,  plus  the  JEC  PDP-10  FORTRAN  compiler  (F40),  that  the 
Adaptive  FORTRAN  system  is  compared.  We  will  present  the  results  of 

running  the  test  programs  under  the  five  different  systems  in  tabular  and 

graphical  form,  and  discuss  their  implications. 
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4.1  Comparative  Compiler  Systems 


In  order  to  have  a  basis  for  evaluating  how  dynamic  optimization 
compares  to  current  compilers,  it  is  necessary  to  run  a  number  of  test 
programs  under  different  compiler  systems  and  to  compare  the  performance 
measurements.  For  FORTRAN,  there  are  three  well  Known  classes  of  FORTRAN 
compilers  that  might  be  used  for  comparison  purposes: 

1)  WATFIV:  a  one  pass  compiler  that  compiles 

directly  to  core.  It  is  very  fast  and 

the  code  produced  is  fairly  decent. 

2)  FORTRAN-IV  G:  usually  a  multi-pass  compiler  that 

produces  a  relocatable  object  module 
that  must  be  loaded  by  a  standard 
system  loader.  It  compiles  relatively 
fast  and  generates  code  that  is  better 
than  that  produced  by  WATFIV. 
Optimization  is  at  the  basic  block  level. 

The  generated  code  is  comparable  to 

that  produced  by  CSE  and  the  "fair" 

code  generator. 

F40,  the  PDP-10  FORTRAN  compiler, 
can  be  classified  as  a  G-iype  compiler 
except  that  optimization  is  at  the 
statement  level.  It  compiles  relocatable 
code  to  a  disk  file.  A  standard  system 
loader  creates  in  core  an  absolute  load 
module  fiOm  one  or  more  relocatable 
object  moo  jles.  This  load  module  can 
be  saved  ai  a  file  on  disk  and  called 
for  execution. 

3)  FORTRAN-IV  H:  a  multi-pass  optimizing  compiler  that 

optimizes  the  entire  program  at 
compile-time.  The  output  is  a 
relocatable  object  module.  The  compiler 
is  usually  a  few  times  slower  than 
FORTRAN-IV  G.  However,  the  object 
code  is  usually  two  or  three  timns 
faster  than  that  produced  by 
FORTRAN-IV  G  (see  Low[69]>. 
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There  are  a  number  of  practical  problems  in  making  the  comparisons. 
First,  not  many  computers  have  ail  three  compilers  available.  Second,  the 
differences  in  character, sties  of  various  computers  (e.g.,  speed,  word  size,  and 
instruction  set)  must  be  taken  into  account;  this  complicates  the  comparisons. 
Finally,  the  differences  between  the  compilers  themselves,  e.g.,  the  parsing 
and  code  generation  techniques  employed,  the  type  of  machine  language 
generated,  and  the  run-time  support  package  must  be  taken  into  account. 

The  ideal  situation  would  be  to  have  all  the  compiler  systems  run  on 
the  same  machine  and  to  use  the  same  compiling  techniques,  optimizers, 
machine  language  generators  and  run-time  support  package.  The  construction 
of  these  compiler  systems  was  considered  to  be  too  large  an  undertaking. 
Therefore,  a  more  expedient  approach  was  taken  in  which  the  Adaptive 
FORTRAN  system  (AF)  was  transformed  to  resemble  each  of  the  other  three 
compilers.  It  was  easy  to  make  the  necessary  changes  because  of  the  way 
AF  w^v  constructed  (see  Sections  3.1.2  and  3.1.3).  The  main  discrepancy 
between  the  transformed  systems  and  the  actual  compilers  lies  not  In  the 
type  of  code  produced,  but  in  the  way  it  is  produced.  Each  transformed 
system  uses  the  Adaptive  FORTRAN  compiler  to  translate  FORTRAN  source  text 
into  quads.  After  loading  the  quads,  but  before  starting  execution,  the  quads 
are  translated  to  the  machine  language  form  that  most  resembles  the  code 
produced  by  the  compiler  being  emulated.  We  feel  this  discrepancy  in  no 
way  alters  the  validity  of  the  test  results,  since  the  use  of  a  consistent 
approach  does  not  bias  the  results. 
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The  three  transformed  systems  and  the  manner  in  which  they  produce 
code  are: 


1)  AFW:  resembles  WATFIV.  The  entire  program  is 

translated  to  "dumb"  code  before  execution  starts. 
A  true  WATFIV  compiler  does  not  produce  quads, 
but  compiles  machine  language  directly.  Like 
WATFIV,  AFW  compiles  in  one  pass  directly  to 
core.  But  whereas  WATFIV’s  machine  language  is 
absolute  and  requires  some  patching  before 
execution  starts,  AFW  produces  relocatable  quads 
which  must  be  loaded.  Therefore,  the 
compiler-loader  phases  of  a  WATFIV  compiler 
protably  would  be  slightly  faster  than  those  for 
AFW. 

2)  AFG:  resembles  FORTRAN-IV  G.  CSE  is  performed  on 

all  the  program’s  basic  blocks  and  then  the  entire 
program  is  translated  to  "fair”  code  before  the 
start  of  execution.  A  FORTRAN-IV  G  compiler 
usually  produces  relocatable  machine  language 
directly  to  a  disk  file.  The  absolute  load  module 
is  created  from  the  relocatable  object  modules  by 
a  standard  system  loader.  Since  these  modules 
are  on  disk,  load-time  should  be  greater  than  that 
for  AFG  which  uses  a  specialized  in  core  loader 
(see  Section  3.1. 2. 2).  The  compile-tim*  for 

FORTRAN-IV  G  should  be  comparable  to  the 
combined  time  required  by  AFG  to  compile  and 
load  the  program  and  perform  the  translation  of 
the  quads  to  machine  language.  Therefore,  the 
compiler-loader  phases  of  AFG  should  be  slightly 
faster  than  that  for  FORTRAN-IV  G. 

3)  AFH:  resembles  FORTRAN-IV  H.  All  optimizations  are 

applied  to  the  entire  program  which  is  then 
translated  to  "fair"  code  before  the  start  of 
execution.  CSE  is  first  performed  on  each  basic 
block  in  the  entire  program.  Then  the  segments 
are  formed  via  fusion  starting  with  the  innermost 
ones  and  working  outwards.  The  list  of  segments 
and  their  order  of  processing  is  given  to  the 
system,  not  deduced  by  it.  As  each  segment  is 
formed,  CM  is  performed  on  it.  After  all 
segments  are  formed,  the  entire  program  is 
translated  to  "fair"  code.  A  true  FORTRAN-!'/  H 
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compiler  also  produces  an  internal  form  such  as 
quads  which  its  optimizers  process  (cf.  [Low69]). 

The  compiler-loader  phases  of  AFH  should  be. 
slightly  faster  than  that  for  FORTRAN-IV  H  for  the 
same  reasons  given  above  for  AFG. 

These  three  compiler  systems  form  the  basis  against  which  AF  is 
compared.  The  i  erformance  of  eacn  system  was  measured  by  running  the 
same  set  of  test  programs  under  each. 

4.2  The  Test  Programs 

In  order  to  draw  meaningful  conclusions  about  tne  performance  of  AF,  it 
is  necessary  to  run  a  number  of  test  programs  under  it  that  have  different 
characteristics  Care  must  be  exercised  in  selecting  the  test  programs  to 
avoid  biasing  the  results.  For  any  compiler  system,  it  is  always  possible  to 
construct  a  program  that  makes  it  look  miserable  or  one  that  makes  it  look 
good.  To  ensure  that  the  test  programs  are  representative  of  the  type  of 
programs  written  in  the  real  world,  both  the  published  literature  and  students 
were  used  as  sources. 

A  number  of  criteria  were  used  to  select  the  test  programs  from  the 
potential  candidates;  they  were  designed  to  test  if  the  usage  of  AF  is 
restrictive.  The  main  criterion  was  to  select  programs  with  differing  loop 
structures,  e.g.,  a  different  number  of  loops,  loop  lengths  (measured  in  basic 
blocks)  and  loop  nestings.  The  reason  was  that  we  wanted  to  test  AF  s 
performance  both  on  those  class  of  programs  it  was  designed  for  (i.e., 
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programs  for  which  57.  of  the  code  accounts  for  507.  of  the  execution  time) 
and  on  those  that  do  not  fail  into  this  classification. 

Second,  we  wanted  programs  that  have  parameter(s)  that  can  be  varied 
to  control  their  execution  time.  This  allows  us  to  study  the  performance  of 
AF  for  small,  medium  and  large  execution  times,  and  determine  if  performance 
is  a  function  of  the  execution  time. 


Finally,  we  wanted  programs  that  were  compute  bound  in  order  to  do  a 
worse  cast?  analysis.  I/O  bound  programs  were  not  selected  because  the  I/O 
handlers  are  not  part  of  the  user’s  program  and  cannot  therefore  be 
optimized  by  the  system,  and  if  the  program  performs  any  I/O,  the  I/O  time 
is  a  constant  for  a  fixed  test  point  regardless  of  the  version  of  the 
experimental  compiler  system  being  run  under.  Thus,  the  analysis  of  the 
results  is  unaltered  since  it  is  the  difference  between  the  measurements  that 
is  relevant  when  making  comparisons. 


The  four  test  orograms  selected  (see  Appendix  B  for  a  listing  of  the 
source)  and  their  characteristics  are: 

1)  EE'  A  student  electrical  engineering  problem. 

a?  Control  parameters:  C2<  and  C3I,  increments 
that  control  the  accurracy  of  the  results  C2 
and  C3  respectively.  For  the  test  runs,  C2I 
was  held  fixed  while  C3I  was  allowed  to 
vary. 

b)  Program  units:  Main  program  unit  only 

c)  Number  of  statements:  51 

d)  Number  of  basic  blocks:  9 

e)  Number  of  individual  loops:  1 

f)  Loop  size:  7  basic  blocks 

g)  Loop  nesting:  1  single  level 


85 


This  program  is  to  typify  the  type  of 
program  written  by  a  student.  It  was  obtained 
from  an  EE  student  [McW72]. 

2)  SIEVE2:  A  prime  number  generator  [Cha67]. 

a)  Control  parameter:  K,  the  number  of  primes 
to  be  generated 

b)  Program  units:  Main  program  unit  only 

c)  Number  of  statements:  86 

d)  Number  of  basic  blocks:  27 

e)  Number  of  individual  loops:  7 

f)  Loop  sizes(in  basic  blocks): 

1,2(2), 4, 5(2), 25t 

g)  Loop  nesting: 

1  single  level 
5  double  level 
1  triple  level 

This  algorithm  is  a  modification  of  Chartres’ 
algorithm  in  that  it  generates  the  first  K  primes 
instead  of  all  the  primes  £  M. 

3)  LES:  A  linear  equation  solver  [For67  and  Mol72]. 

a)  Control  parameter:  N,  the  number  of 

variables 

b)  Program  units:  Main  program  U"nt  plus  2 
subprogram  units 

c)  Number  cf  statements:  97 

d)  Number  of  basic  blocks:  45 

MAIN:  15 
DECOMP:  20 
SOLVE:  10 

e)  Number  of  individual  loops:  13 

MAIN:  4 
DECOMP:  5 
SOLVE:  4 

f)  Loop  sizes(in  basic  blocks): 

MAIN:  1,2, 4(2) 

DECOMP:  1(2), 3, 4, 18 
SOLVE:  1(2), 3(2) 


+  The  notation  is  to  be  interpreted  as  follows:  for  the  7  individual  loops,  one 
is  of  size  1,  two  of  size  2,  one  of  size  4,  two  of  size  5,  and  one  of  size 
25. 


g)  Loop  resting: 

MAIN:  2  single  level 
2  double  level 
DECOMP:  1  single  level 
3  double  level 
1  triple  level 
SOLVE:  2  single  level 
2  double  level 

The  original  algorithm  given  in  the  textbook 
by  Forsythe  and  Moler  [For67]  consists  of  two 
subroutines,  However,  Moler  later  published  new 
subroutines  that  were  a  modification  of,  and 
replacement  for,  the  corresponding  original 
routines  [Mol72].  Theso  were  the  routines  used 
in  the  program.  The  test  matrices  were 
generated  by  the  program  and  correspond  to 
Example  3.6  in  the  book  by  Gregory  and 
Karney  [Gre69]  (see  Appendix  B,  Section  B.2). 

4)  QZ:  An  eigenvalue  problem  [Mc!73]. 

a)  Control  parameter:  N,  the  size  of  the  square 
input  matrix 

b)  Program  units:  Main  program  unit  plus  9 
subprogram  units 

c)  Number  of  statements:  654 

d)  Number  of  basic  blocks:  323 

MAIN:  9 
QZ:  7 
QZHES:  66 
QZIT:  97 
QZVAL:  49 
QZVEC:  77 
HSH3:  5 
HSH2:  5 
CHSH2:  5 
CDIV:  3 

e)  Number  of  individual  loops:  51 

MAIN:  2 
QZHES:  19 
QZIT:  12 
QZVAL:  4 
QZVEC:  14 

HSH3,  HSH2,  CHSH2,  CDIV:  0 
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f)  Loop  sizes(in  basic  blocks): 

MAIN:  2,5 

QZHES:  2(12),3, 5, 7(2), 21,23,32 
QZIT:  2(7), 3, 9, 10,44,66 
QZVAL:  2(3), 42 

QZVEC:  2(6), 3(2), 5, 7, 18, 19(2), 48 

g)  Loop  nesting: 

MAIN:  1  single  level 

1  double  level 
QZHES:  3  single  level 

7  double  level 

9  triple  level 
QZIT:  3  single  level 

2  double  level 

7  triple  level 

QZVAL:  1  single  level 

3  double  level 
QZVEC:  3  single  level 

7  double  level 

4  triple  level 

This  program  was  obtained  from  Stewart  and 
is  described,  but  not  given  in  his  paper  with 
Moler  [Mol73].  The  test  matrices  are  generated 
by  the  program  and  were  suggested  by  Stewart 
(see  Appendix  B,  Section  B.5).  This  algorithm  is 
interesting  in  that  the  intermediate  quantities 
produced  by  the  program  may  not  be  the  same 
owing  to  rounding  errors.  Consequently,  the 
execution  times  are  theoretically  not  strictly 
comparable.  However,  for  practical  purposes 
they  are,  i.e.,  the  timings  depend  in  a  uniform 
manner  on  the  size  of  the  matrix. 

Each  of  these  test  programs  were  run  under  AF,  AFW,  AFG  and  AFH, 
plus  F40,  the  FORTRAN-IV  compiler  on  the  PDP-10.  We  now  present  the 


results  of  these  test  runs. 


4.3  The  Test  Results 


The  performance  of  each  compiler  system  is  measured  by  obtaining  the 
total  run-time  for  a  test  program  as  a  function  of  its  control  parameter, 
where  total  run-time  is  the  sum  of  compilation  time,  load  time  and  execution 
time.  The  timings  were  made  on  a  PDP-KA10  computer  system  with  Ampex 
core  having  a  1.8/as  read/write  cycle,  in  order  to  obtain  accurate  timings,  it 
is  necessary  to  run  the  compiler  systems  with  no  load  on  the  computer 
system,  for  timings  are  sensitive  to  the  system  load.  During  a  test  run,  the 
computing  environment  consisted  of  the  monitor,  the  I/O  handlers  and  the 
particular  compiler  system  being  tested,  identical  computer  runs  produced  the 
same  timings  so  there  is  no  statistical  fluxuation  in  the  results.  A  10 /as  clock 
was  used  to  make  the  timings,  which  are  given  here  in  seconds. 


The  results  of  the  test  runs  are  presented  in  tabular  and  graphical 

form.  There  are  five  tables  for  each  program  (Tables  4.1-4.4): 

1)  Compiler  and  Loader  Timing  Statistics 

The  following  statistics  are  tabulated  for  F40,  AFW, 

AFG,  AFH,  and  AF: 

a)  Compilation  time, 

b)  Load  time, 

c)  Total  of  a)  and  b), 

d)  Optimization  time,  i.e.,  that  part  of  the 
compilation  time  spent  optimizing  the  program, 

e)  The  percent  of  the  compilation  time  spent 
optimizing,  i.e., 

(optimization  time/compilation  time)*100. 
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2)  Execution  Times 

The  execution  times  of  the  program  for  F40,  AFW, 

AFG,  AFH  and  AF  are  tabulated  as  a  function  of  ine 

control  parameter.  Data  points  were  taken  until  the 
amount  of  time  spent  optimizing  tho  program  became 
constant,  i.e.,  until  no  more  op1  imizations  were 
performed. 

3)  Total  Run-time 

The  total  run-time  (compilation  time  plus  load  time 
plus  execution  time)  is  tabulated  as  a  function  of  the 

control  parameter  for  F40,  AFW,  AFG,  AFH  and  AF. 

4)  Total  Run-time  Ratios 

This  table  indicates  the  relative  speed  of  AF  as 

compared  to  each  of  the  compiler  systems.  The  ratio 
of  total  run-times  is  tabulated  as  a  function  of  the 

control  parameter. 

5)  Ar  Optimization  Statistics 

The  following  statistics  are  tabulated  as  a  function 
of  the  control  parameter: 

a)  Execution  time, 

b)  Optir  ization  time,  i.e.,  that  part  of  the 
execution  time  spent  dynamically  optimizing 
the  program, 

c)  The  percent  cf  the  execution  time  spent 
optimizing  the  program,  i.e., 

(optimization  time/compilation  time)*100. 

The  tt  ird  table  of  total  run-times  represents  the  measurements  for 
comparing  the  performance  of  each  compiler  system  against  AF.  In  order  to 
compare  the  systems  visually,  this  table  is  present  '  in  graphical  form  for 
each  test  program  (Figures  4.1-4.4).  The  coordinates  of  the  graph  are  the 
total  run-time  versus  the  control  parameter.  The  data  for  each  of  the 
compiler  systems  is  plotted  on  the  same  set  of  axes  thereby  producing  a  set 
of  performance  curves  that  can  easily  be  compared. 


90 


In  order  to  further  demonstrate  the  effects  of  dynamic  optimization,  we 
constructed  another  compiler  system,  AFI,  which  performs  no  optimizations,  but 
runs  the  program  interpretively.  Table  4.5  shows  the  results  of  the  initial 
test  points  for  the  test  programs  QZ  and  LES.  These  results  are  plotted  on 
the  corresponding  graphs. 

Finally,  we  were  interested  in  studying  the  behavior  of  AF  for  very 
small  execution  times  because  the  optimization  time  then  constitutes  a  large 
percentage  of  the  execution  time.  We  wanted  to  see  how  the  fraction  of 
execution  time  devoted  to  optimizaiion  grows  and  finally  peaks.  Refined 
measurements  were  made  for  the  test  programs  QZ  and  LES,  and  the  results 
are  tabulated  in  Table  4.6.  The  results  also  were  used  in  accurately  plotting 
the  initial  portion  of  the  corresponding  performance  curves. 
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Table  4.1a  Compiler  and  Loader  Timing  Statistics  for  EE 


Compilation 

Time 

Load 

Time 

Total 

Optimization 

Time 

1  of 

Compilation 

F40 

3.37 

2.25 

5.62 

— 

AFW 

.59 

.03 

.62 

.07 

rl.86 

AFG 

.77 

.03 

.80 

.25 

32.47 

AFH 

.81 

.03 

.84 

.29 

35.80 

AF 

.52 

.03 

.55 

— 

Table  4.1b  Execution  Times  for  EE 


C3I 

F40 

AFW 

AFG 

AFH 

AF 

.5 

41.40 

35.13 

32.32 

30.74 

31.04 

1.0 

20.75 

17.70 

16.55 

15.51 

15.81 

2.0 

10.63 

9.09 

8.50 

7.96 

8.27 

4.0 

5.48 

4.69 

4.39 

4.12 

4.42 

6.0 

3.85 

3.28 

3.08 

2.88 

3.19 

8.0 

3.02 

2.58 

2.42 

2.27 

2.57 

Table 

4.1c  Total 

Run-time  for  EE 

C3I 

F40 

AFW 

AFG 

AFH 

AF 

0.5 

47.02 

35.75 

33.63 

31.58 

31.59 

1.0 

26.37 

18.32 

17.35 

16.35 

16.36 

2.0 

16.25 

9.71 

9.30 

8.80 

8.82 

4.0 

11.10 

5.31 

5.19 

4.96 

4.97 

6.0 

9.47 

3.90 

3.88 

3.72 

3.74 

8.0 

8.64 

3.20 

3.22 

3.11 

3.12 
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Table  4. Id  Total  Run-timo  Ratios  for  EE 


C3I 

F40/AF 

AFW/AF 

AFG/AF 

AFH/AF 

0.5 

1.49 

1.13 

1.06 

.99 

1.0 

1.61 

1.12 

1.06 

.99 

2.0 

1.84 

1.00 

1.C5 

.99 

4.0 

2.23 

1.07 

1.04 

.99 

6.0 

2.53 

1.04 

1.04 

.99 

8.0 

2.77 

1.03 

1.03 

.99 

Table  4.1e  AF  Optimization  Statistics  for  EE 


C3I 

Execution 

Time 

Optimization 

Time 

7.  of 

Execution 

0.5 

31.04 

.31 

1.00 

1.0 

15.81 

.31 

1.96 

2.0 

8.27 

.31 

3.75 

4.0 

4.42 

.31 

7.01 

6.0 

3.19 

.3i 

9.72 

8.0 

2.57 

.31 

12.06 

93 


Table  4.2a  Compiler  and  Loader  Timing  Statistics  for 


Compilation 

Time 

Load 

Time 

Total 

Optimization 

Time 

F40 

4.45 

2.43 

6.88 

.... 

AFW 

.77 

.17 

.94 

.11 

AFG 

.89 

.17 

1.06 

.23 

AFH 

1.03 

.17 

1.20 

.37 

AF 

.66 

.17 

.83 

— - 

Table  4.2b  Execution  Times  for  SIEVE2 


K 

F4C 

AFW 

AFG 

AFH 

10 

.07 

.06 

.06 

.02 

20 

.07 

.07 

.06 

.02 

30 

.07 

.07 

.06 

.02 

40 

.07 

.08 

.07 

.03 

50 

.07 

.08 

.07 

.03 

60 

.07 

.09 

.08 

.03 

70 

.08 

.10 

.08 

.04 

80 

.10 

.11 

.09 

.04 

90 

.10 

.11 

.09 

.05 

100 

.10 

.12 

.10 

.05 

200 

.18 

.23 

.16 

.12 

300 

.28 

.37 

.25 

.21 

400 

.38 

.52 

.34 

.30 

500 

.52 

.70 

.44 

.41 

600 

.65 

.89 

.56 

.52 

700 

.78 

1.09 

.68 

.65 

800 

.55 

1.29 

.80 

.77 

900 

1.10 

1.51 

.93 

.91 

1000 

1.27 

1.75 

1.07 

1.05 

SIEVE2 
7.  of 

Compilation 


14.29 

25.84 

30.10 


AF 

.12 

.15 

.20 

.21 

.47 

.47 

.47 

.48 

.48 

.49 

.55 

.64 

.73 

.83 

.95 

1.07 

1.19 

1.33 

1.46 
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Table  4.2c  Total  Run-tim#  for  SIEVE2 


K 

F40 

AFW 

AFG 

AFH 

AF 

10 

6.95 

1.00 

1.12 

1.22 

.95 

20 

6.95 

1.01 

1.12 

1.22 

OR 

30 

6.95 

1.01 

1.12 

1.22 

1.03 

40 

6.95 

1.02 

1.13 

1.23 

1.04 

50 

6.95 

1.02 

1.13 

1.23 

1.30 

60 

6.95 

1.03 

1.14 

1.23 

1.30 

70 

6.96 

1.04 

1.14 

1.24 

1.30 

80 

6.98 

1.05 

1.15 

1.24 

1.31 

90 

6.98 

1.05 

1.15 

1.25 

1.31 

100 

6.98 

1.06 

1.16 

1.25 

1.32 

200 

7.06 

1.17 

1.22 

1.32 

1.38 

300 

7.16 

1.31 

1.31 

1.41 

1.47 

400 

7.26 

1.46 

1.40 

1.50 

1.56 

500 

7.40 

1.64 

1.50 

1.61 

1.66 

600 

7.53 

1.83 

1.62 

1.72 

1.78 

700 

7.66 

2.03 

1.74 

1.85 

1.90 

800 

7.83 

2.23 

1.86 

1.97 

2.02 

900 

7.98 

2.45 

1.99 

2.11 

2.16 

1000 

8.15 

2.69 

2.13 

2.25 

2.29 

Table  4.2d  Totel  Run-time  Ratios  for  SIEVE2 

K 

F40/AF 

AFW/AF 

AFG/AF  AFH/AF 

10 

7.32 

1.05 

1.18 

1.28 

20 

7.09 

1.03 

1.14 

1.24 

30 

6.75 

.98 

1.09 

1.18 

40 

6.68 

.98 

1.08 

1.18 

50 

5.35 

.73 

.87 

.95 

60 

5.35 

.79 

.88 

.95 

70 

5.35 

.80 

.88 

.95 

80 

5.33 

.80 

.88 

.95 

90 

5.33 

.80 

.88 

.95 

5.29 

.80 

.88 

.95 

200 

5.11 

.85 

.88 

.96 

300 

4.87 

.90 

.89 

.96 

400 

4.65 

.94 

.90 

.96 

500 

4.46 

.99 

.90 

.97 

600 

4.23 

1.03 

.91 

.97 

700 

4.03 

1.07 

.92 

.97 

800 

3.88 

1.10 

.92 

.98 

900 

3.69 

1.13 

.92 

.98 

1000 

3.56 

1.17 

.93 

.98 
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T able  4.2#  AF  Optimization  Statistics  for  SIEVE2 


Execution 

Optimization 

7.  of 

K 

Time 

Time 

Execution 

10 

.12 

.05 

41.67 

20 

.15 

.07 

46.67 

30 

.20 

.11 

55.00 

40 

.21 

.11 

52.38 

50 

.47 

.37 

78.72 

60 

.47 

.37 

78.72 

70 

.47 

.37 

78.72 

80 

.48 

.37 

77.08 

90 

.48 

.37 

77.08 

100 

.49 

.3/ 

75.51 

200 

.55 

.37 

67.27 

300 

.64 

.37 

57.81 

400 

.73 

.37 

50.68 

500 

.83 

.37 

44.58 

600 

.95 

.37 

38.95 

700 

1.07 

.37 

34.58 

800 

1.19 

.37 

31.09 

900 

1.33 

.37 

27.82 

1000 

1.46 

.37 

25.34 
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Table  4.3a  Compiler  and  Loadar  Timing  Statistics  for  LES 


Compilation 

Load 

Optimization 

7.  of 

Time 

Time 

Total 

Time 

Compilation 

F40 

6.68 

2.26 

8.88 

.... 

AFW 

.99 

.10 

1.09 

.17 

17.17 

AFG 

1.20 

.10 

1.30 

.38 

31.67 

AFH 

1.30 

.10 

1.40 

.48 

36.92 

AF 

.82 

.10 

.92 

---- 

— 

N 

Table 

F40 

4.3b  Execution  Times 

AFW  AFG 

for  LES 

AFH 

AF 

— 

. 

■ 

— 

— 

— 

5 

.08 

.09 

.10 

.02 

.25 

10 

.25 

.25 

.21 

.11 

.56 

15 

.65 

.62 

.48 

.31 

.85 

20 

1.42 

1.31 

.98 

.68 

1.36 

25 

2.62 

2.42 

1.78 

1.27 

2.04 

30 

4.37 

4.04 

2.94 

2.12 

2.90 

35 

6.82 

6.29 

4.56 

3.29 

4.14 

40 

10.07 

9.25 

6.69 

4.83 

5.68 

45 

14.08 

13.04 

9.41 

6.79 

7.65 

50 

19.17 

17.75 

12.79 

9.21 

10.08 

55 

25.32 

23.48 

16.91 

12.16 

13.03 

60 

32.68 

30.32 

21.83 

15.68 

16.70 

65 

41.33 

38.39 

27.62 

19.81 

20.84 

70 

51.38 

47.78 

34.38 

24.62 

25.66 
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Table  4.3#  AF  Optimization  Statistics  for  LES 


N 

Execution 

Time 

Optimization 

Time 

t  of 

Execution 

5 

.25 

.09 

36.00 

10 

.56 

.29 

51.79 

15 

.85 

.34 

40.00 

20 

1.36 

.45 

33.09 

?5 

2.04 

.54 

26.47 

30 

2.90 

.54 

18.62 

35 

4.14 

.61 

14.73 

40 

5.68 

.61 

10.74 

45 

7.65 

.61 

7.97 

50 

10.08 

.61 

6  0C 

55 

13.03 

.61 

4.68 

60 

16.70 

.77 

4.61 

65 

20.84 

.77 

3.69 

70 

25.66 

.77 

3.00 
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Table  4.4a  Compiler  and 

Loader  Timing  Statistics  for  QZ 

Compilation 

Load 

Optimization 

t  of 

Time 

Time 

Total 

Time 

Compilation 

F40 

67.00 

3.83 

70.83 

_____ 

AFW 

10.12 

.58 

10.70 

1.65 

16.30 

AFG 

13.58 

.58 

14.16 

5.11 

37.63 

AFH 

15.16 

.58 

15.74 

6.69 

44.13 

AF 

8.47 

.58 

9.05 

— 

Table 

4.4b  Execution  Times  for  QZ 

N 

F40 

AFW 

AFG 

AFH 

AF 

5 

.45 

.62 

.52 

.74 

2.29 

10 

2.70 

3.02 

2.03 

2.04 

4.81 

15 

7.42 

7.85 

4.94 

4.46 

7.63 

20 

15.42 

15.89 

9.69 

8.34 

12.02 

25 

30.17 

30.70 

18.33 

15.39 

20.32 

30 

48.13 

48.43 

28.63 

23.65 

23.63 

35 

73.62 

73.57 

43.12 

35.22 

40.34 

40 

109.75 

109.05 

63.38 

51.41 

56.74 

45 

149.85 

150.91 

87.33 

70.42 

75.81 

50 

204.45 

200.21 

115.41 

92.58 

98.04 

Table 

4.4c  Total 

Run-time 

for  QZ 

N 

F40 

AFW 

AFG 

AFH 

AF 

5 

71.28 

11.32 

14.68 

16.48 

11.34 

10 

73.53 

13.72 

16.19 

17.78 

13.86 

15 

78.25 

18.55 

19.09 

20.20 

16.68 

20 

86.25 

26.59 

23.85 

24.08 

21.07 

25 

101.00 

41.40 

32.49 

31.13 

29.37 

30 

118.96 

59.13 

42.79 

39.39 

37.68 

35 

144.45 

84.27 

57.28 

50.96 

49.39 

40 

180.58 

119.75 

77.54 

67.15 

65.79 

45 

220.68 

161.61 

101.49 

86.16 

84.86 

50 

275.28 

210.91 

129.57 

108.32 

107.09 
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Tabl#  4.4d  Total  Run-timo  Ratios  for  QZ 


N 

F40/AF 

AFW/AF 

AFG/AF 

AFH/AF 

5 

6.29 

.99 

1.29 

1.45 

10 

5.31 

.98 

1.17 

1.28 

15 

4.69 

1.11 

1.14 

1.21 

20 

4.09 

1.26 

1.13 

1.14 

25 

3.44 

1.41 

1.11 

1.06 

30 

3.16 

1.57 

1.14 

1.05 

35 

2.92 

1.71 

1.16 

1.03 

40 

2.74 

1.82 

1.18 

1.02 

45 

2.60 

1.90 

1.20 

1.02 

50 

2.57 

1.97 

1.21 

1.01 

Tablo  4.4e  AF  Optimization  Statistics  for  QZ 

Execution  Optimization  7.  of 

N  Time  Time  Execution 


5 

2.29 

1.44 

62.88 

10 

4.81 

2.48 

51.56 

ID 

7.63 

2.77 

36.30 

20 

12.02 

3.20 

26.62 

25 

20.32 

4.39 

21.60 

30 

28.63 

4.40 

15.37 

35 

40.34 

4.48 

11.11 

40 

56.74 

4.65 

8.20 

45 

75.81 

4.65 

6.13 

50 

98.04 

4.65 

4.74 
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Tabl«  4.5a  AFI  Timings  for  QZ 


Execution 

Total 

N 

Time 

Run-time 

5 

2.58 

11.63 

10 

16.69 

25.74 

15 

46.10 

55.15 

Table 

4.5b  AFI 

Timings  for  1 

Execution 

Total 

N 

Time 

Run-time 

5 

.28 

1.20 

10 

1.30 

2.22 

15 

3.79 

4.71 

20 

8.40 

9.32 

25 

15.80 

16.72 
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Table  4.6a  Refined  AF  Timings  for  QZ 


N 

Execution 

Time 

Optimization 

Time 

7.  of 

Execution 

Total 

Run-time 

1 

.17 

.00 

0.00 

9.22 

2 

.42 

.18 

42.86 

9.47 

3 

1.08 

.59 

54.63 

10.13 

4 

1.72 

1.01 

58.72 

10.77 

5 

2.29 

1/44 

62.88 

11.34 

6 

2.74 

1.70 

62.04 

11.79 

7 

3.16 

1.95 

61.71 

12.21 

8 

4.09 

2.38 

58.19 

13.14 

9 

4.26 

2.41 

56.57 

13.31 

10 

4.81 

2.48 

51.56 

13.86 

Table  4.<ib 

Refined  AF 

Timings  for  LES 

Execution. 

Optimization 

7.  of 

Total 

N 

Tims 

Time 

Execution 

Run-time 

1 

.08 

.00 

0.00 

1.00 

2 

.12 

.03 

25.00 

1.04 

3 

.17 

.06 

35.29 

1.09 

4 

.21 

.08 

38.10 

1.13 

5 

.25 

.09 

36.00 

1.17 

6 

.35 

.18 

51.43 

1.27 

7 

.41 

.22 

53.66 

1.33 

8 

.47 

.26 

55.32 

1.39 

9 

.49 

.26 

53.06 

1.41 

10 

.56 

.29 

51.79 

1.48 
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4.4  Analysis  of  Tost  Results 

Before  analyzing  the  position  of  the  AF  performance  curve  relative  to 
the  curves  for  AFW,  AFG  and  AFH,  we  first  analyze  the  relative  positions  o* 
the  AFW,  AFG  and  AFH  curves  and  see  if  they  conform  to  expectations. 

AFW,  AFG  and  AFH  differ  only  in  the  amount  of  compile-time 
optimization  they  apply  to  a  program,  and  thus  in  the  efficiency  of  the 
machine  language  they  produce.  At  the  start  of  execution,  ihe  AFH  curve 
lies  above  the  AFG  curve  which  in  turn  lies  above  the  AFVA  curve.  Because 
of  the  relative  efficiency  of  the  code,  the  AFW  curve  eventually  will  cross 
the  other  two,  and  the  AFG  curve  will  cross  the  AFH  curve.  These 
crossover  points  occur  when  the  difference  in  compilation  »ime  equals  the 
difference  in  execution  time.  It  is  expected  then  that  if  a  program  is  run 
long  enough,  the  AFW  curve  will  lie  above  the  AFG  curve  which  in  turn  will 
lie  above  the  AFH  curve.  If,  however,  the  additional  optimization  (CM) 
performed  by  AFH  has  no  effect,  i.e.,  does  not  remove  any  invariant  quads 
from  any  loop,  then  the  effort  is  wasted  and  the  AFG  curve  will  lie  below 
the  AFH  curve. 

By  checking  the  tables  and  figures  for  each  test  program,  it  is  seen 
that  the  curves  follow  this  behavior  pattern.  Only  for  the  test  program 
SIEVE2  does  the  AFG  curve  lie  below  the  AFH  curve.  The  reason  is  exactly 
that  given  above,  viz.,  CM  has  no  effect.  An  examination  of  the  optimization 
results  showed  that  the  loops  did  not  contain  any  invariant  quads. 
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The  performance  curves  for  the  four  test  programs  indicate  that  the 
range  of  applicability  for  AFG  is  very  narrow  since  the  crossover  point 
between  AFW  and  AFG  is  very  close  to  the  crossover  point  for  AFG  and 
AFH.  One  can  only  conjecture  that  AFG  is  not  necessary,  i'nd  that  c  ■* 
should  run  under  either  AFW  or  AFH  depending  on  the  length  of  execution. 


As  for  the  AF  curve,  it  is  expected  to  initially  lift  belo at  all  the  other 
curves  since  it  performs  no  initial  optimizations.  This  is  indeed  the  case.  If 
a  program  is  run  long  enough,  the  AF  curve  wilt  asymptotically  become 
parallel  to  the  AFH  curve  because  the  executable  code  becomes  identical  to 
that  produced  by  AFH.  Indications  are  that  it  approaches  the  AFH  curve  from 
above  if  the  time  to  totally  optimize  the  program  exceeds  the  compile-time 
optimization  time  for  AFH;  otherwise  it  approach*!;  it  from  below. 


There  is  a  range  in  which  the  AF  curve  r  ight  cross  over  one  or  more 
of  the  other  curves  and  then  cross  back  under.  This  occurs  for  small 
execution  times.  The  width  of  the  range  seems  to  depend  on  the  diversity 
of  the  program’s  loop  structure.  Consider  each  test  program  in  terms  of 
increasingly  diverse  loop  structures: 

1)  EE  (see  Figure  4.1):  A  short  program  having  one  loop 
which  constitutes  most  of  the  program.  Since  this  loop 
is  an  innermost  loop,  the  only  difference  between  AF 
and  AFH  is  that  AF  first  translates  the  loop  to  "dumb" 
code  which  is  executed  before  beinp  totally  optimized. 

Hence,  AF’s  execution  times  differ  from  those  for  AFH 
by  a  constant. 

2)  SiEVE2  (see  Figure  4.2):  Like  EE,  this  program  contains 
a  main  execution  loop  which  constitutes  most  of  the 
program  and  gets  totally  optimized  almost  immediately. 


However,  it  contains  a  few  embedded  loops  and  is 

therefore  considered  an  outermost  loop.  Hence  its 

optimization  is  more  gradual  than  the  loop  in  EE,  but 
net  gradual  enough  because  representations  are  change  i 
before  it  is  necessary.  The  spike  occurring  at  the 
beginning  of  the  AF  curve  is  due  to  CM  having  no 

effect  on  the  outermost  loop.  This  optimization  time  is 
wasted  and  the  machine  language  segment  is  identical  to 
that  produced  by  CEE. 

3)  LES  (see  Figure  4.3):  This  program  consists  of  three 

program  units,  each  containing  a  number  of  loops.  Each 
subprogram  unit  contains  a  doubly  nested  loop  which 
accounts  for  most  of  its  execution  time.  The  one 

program  unit  gets  called  only  once  so  the  benifits  of 
total  optimization  are  wasted  if  the  program  does  not 

run  long  enough.  The  other  program  unit  is  called 
repetitively  so  eventually  the  double  loop  and  the  entire 
subprogram  get  totally  optimized.  The  results  are 
excellent,  for  only  AFW  is  slightly  better  than  AF  for 
small  execution  times.  The  initial  part  of  the  AF  curve 
is  not  smooth  due  to  the  optimization  pertabations  which 
are  more  apparent  for  small  execution  times. 

4)  QZ  U'ee  Figure  4.4):  This  program  contains  the  most 

diverse  ‘nop  structure  and  consists  of  10  program  units. 

Four  of  the  units  constitute  the  main  part  of  the 

program,  and  each  is  called  only  once.  There  are  a 
large  number  of  inner  loops  of  1-2  basic  blocks  whose 
early  optimization  probably  contributes  to  the  fact  that 
the  AF  curve  is  the  best  of  any  test  programs.  This  is 
also  the  only  case  in  which  the  time  for  total  dynamic 
optimization  is  smaller  than  AFH’s  optimization  time.  The 
initial  part  of  the  AF  curve  is  not  smooth  for  the  same 
reason  stated  for  LES. 

The  test  results  indicate  that  AF  does  not  outperform  all  the  other 
systems  across  the  entire  spectrum  of  run-times,  but  that  for  a  particular 
program  there  is  a  given  range  in  which  one  compiler  system  is  preferential 
over  any  of  the  others.  However,  AF  is  better  than  any  other  single 
compiler  system  over  the  spectrum.  Thus,  we  conclude  that  it  is  better  to 


build  one  compiler  system  to  cover  :he  run-time  spectrum  than  three 
separate  specialized  compiler  systems,  each  designed  for  a  different  range  of 
the  spectrum. 


(seicoud 
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Chapter  V 


Conclusions 


This  dissertation  investigated  the  possibility  of  improving  the  cost 
effectiveness  of  code  optimization.  Whereas  current  approaches  apply  code 
optimization  equally  to  the  entire  program  at  compile-time,  our  approach 

exploits  dynamically  the  observed  behavioral  characteristics  of  programs,  viz., 
that  a  small  part  (57.)  of  the  code  accounts  for  a  large  portion  (507.)  of  the 
execution  time.  We  studied,  in  general,  the  problems  of  performing  code 
optimizations  at  run-time,  i.e.,  dynamically  determining  which  sections  of  code 
to  optimize,  how  much  optimization  to  apply,  and  when  to  apply  that 

optimizat'on.  This  resulted  in  the  specification  of  a  number  of  adaptive 

schema.  The  most  promising  scheme  was  incremental  dynamic  optimization 
which  uses  optimization  counts  to  determine  which  section  of  code  to  optimize 
and  when.  The  effect  is  gradual  optimization  of  a  program,  i.e.,  one 

optimization  is  applied  to  one  section  of  code  at  a  time.  The  longer  the 
program  executes,  the  more  optimized  a  section  becomes.  Using  this  scheme, 
a  prototype  system  was  built  for  an  interesting  subset  of  the  FORTRAN 

language.  Performance  of  this  system,  Adaptive  FORTRAN  (AF),  was  measured 

on  a  representative  set  of  programs.  In  order  ic  make  unbiased  comparisons 
with  existing  compiler  systems,  the  adaptive  system  was  transformed  into 

various  "normal"  compiler  systems  that  generate  code  analogous  to  that 
produced  by  WATFIV,  FORTRAN-IV  G  and  FORTRAN-IV  H.  The  same  set  of 
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test  programs  were  rin  under  these  transformed  systems,  and  the 
performance  measurements  compared  against  those  for  AF. 

The  results  were  very  encouraging.  While  AF  did  not  outperform  each 
of  the  other  systems  at  all  points  in  the  run-time  spectrum,  it  Jid  perform 
better  over  the  spectrum  than  did  any  other  single  compiler  system.  The 
major  remaining  problem  lies  in  controlling  the  rate  of  optimization.  AF’s 
performance  curves  look  worst  *or  small-medium  run-times,  indicating  too  much 
optimization  is  being  applied  too  soon.  More  research  is  needed  to  find  a 
better  means  for  controlling  the  optimization  rate. 

AF  is  the  last  of  an  evolutionary  chain  of  experimental  systems  and 
there  is  every  reason  to  believe  it  Is  possible  to  construct  other  variants 
which  control  optimization  better  and  outperform  all  fixeu-strategy  systems 
everywhere  in  the  run  time  spectrum.  The  first  line  of  attack  should  be  to 
continue  working  with  optimization  counts.  The  method  for  estimating 
optimization  counts  presented  in  Section  2.5,  viz.,  using  the  performance 
curves,  E(q),  for  each  optimization,  should  be  explored.  It  would  not  be 
difficult  to  obtain  such  curves.  Determining  optimization  counts  heuristically 
has  its  limitations,  for  we  found  it  hard  to  change  an  optimization  count  so 
only  a  portion  of  the  performance  curve  is  affected.  Therefore,  if  any 
appreciable  progress  is  to  be  made,  a  more  theoretical  basis  for  determinirg 
them  must  be  developed.  After  this  line  of  attack  has  been  exhausted,  other 
computationally  feasible  mechanisms  and/or  parameters  for  controlling  the  rate 


of  optimization  should  be  explored. 


In  order  to  evaluate  how  good  the  incremental  dynamic  optimization 
scheme  is,  and  determine  exactly  how  much  bettor  we  can  expect  to  do,  the 
absolute  measure  of  performance  should  be  obtained,  for  each  test  program, 
using  the  iterative  dynamic  optimization  scheme.  Using  a  large  amount  of 
computing  effort,  this  performance  curve  can  be  obtained  in  the  following 
manner.  First,  certain  measurements  must  be  made.  For  each  optimizer,  this 
consists  of  determing  its  performance  curve,  E(q),  and  its  space  requirements. 
For  each  segment  of  the  program,  measurements  cf  its  execution  time  and 
space  requirements  in  all  possible  representations  must  be  made  Using  these 
measurements,  optimal  policies  can  be  determined  at  run-time  over  the 
execution  spectrum.  But  since  it  is  not  Known  when  to  determine  such 
policies,  they  would  have  to  be  continuous!*/  determined,  say  after  the 
execution  of  a  basic  block  or  segment,  or  a  quantum  of  execution  time. 
When  the  policy  changes  determines  when  to  optimize.  Using  these  results, 
the  program  would  then  be  run  for  a  given  test  point,  policies  changed  at 
the  appropriate  time,  and  its  execution  time  measured.  The  entire 
performance  curve  for  the  program  can  be  obtained  in  this  manner.  The 
resulting  curve  does  not  contain  the  time  required  to  determine  the  optimal 
policy;  therefore  it  is  the  absolute  best  one  can  expect  from  any  strategy. 

We  feel  that  we  have  demonstrated  a  worthwhile  alternative  to  compiler 
design  that  should  be  considered  seriously.  The  approach  makes  more  sense, 
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from  an  implementation  viewpoint,  than  building  many  special  purpose 
compilers  The  system  can  be  built  in  an  incremental  fashion  because  of  its 
modularity.  Each  step  consists  of  programming  and  debugging  an  ootimizer 
module  and  then  adding  it  to  the  system.  The  final  product  is  an  adaptive 
compiler  that  does  not  require  much  more  effort  to  build  ihan  a  full 
optimizing  compiler.  AF  was  built  in  this  manner:  in  3  man  months  we  had 
programmed  the  compiler  and  interpreter,  and  had  programs  running.  Then 
each  optimizer  was  programmed  in  1/2  to  1  man  month,  debugged  and  added 
to  the  system.  In  less  than  a  man  year,  the  system  was  completed. 

It  is  clear  that  such  an  implementation  approach  is  open  ended  up  *o  a 
point,  for  one  can  Keep  improving  the  eificiency  of  the  generated  code  by 
adding  more  efficient  optimizations  until  one  exhausts  optimizations.  There 
are  other  well  defined  optimizations  that  work  with  the  same  internal  form 
we  produce;  they  should  be  added  to  the  system  and  the  performance 
measurements  retaken,  e.g.,  strength  reduction,  opening  subroutines,  and  other 
machine  dependent  optimizations.  There  is  one  problem  associated  with 
adding  more  optimization^  that  became  apparent  as  we  constructed  AF,  viz., 
controlling  the  rate  of  optimization  becomes  harder.  The  means  of  control 
must  be  defined  more  sharply.  This  is  the  main  reason  why  AF’s  optimization 
counts  are  determined  as  a  function  of  the  programs  loop  structure.  As  each 
optimizer  was  added,  it  became  apparent  that  basic  blocks  and  segments 
could  no  longer  be  treated  uniformly  with  respect  to  the  optimization  counts. 
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Thus  we  see  that  additional  research  is  needed  to  more  clearly 
understand  dynamic  optimization  and  to  refine  the  current  approach. 
However,  other  areas  are  suggested  on  which  further  research  should  be 
conducted.  It  would  be  interesting  to  see  if  some  hardware  features  can  be 
developed  to  aid  in  controlling  optimization.  One  beneficial  feature  would  be 
for  determining  which  section  of  code  is  being  executed  the  most.  There  is 
one  existing  hardware  feature  we  have  not  exploited  for  improving  execution 
time,  viz.,  micro-programming.  There  are  two  areas  in  the  system  that  could 
utilize  this  feature.  One  is  in  interpreting  the  internal  form  produced  by  the 
compiler.  Instead  of  writting  a  program  to  interpret  it,  micro-code  could  be 
developed  for  each  operation.  The  other  area  is  in  the  machine  language 
generators.  Instead  of  generating  optimized  code,  specialized  micro-code  could 
be  generated  that  performs  the  operations  more  efficiently. 

Finally,  the  implications  of  our  ideas  should  be  studied  with  respect  to 
conversational  languages  as  indicated  by  Mitchell  |.Mit70].  He  stated  that  a 
major  problem  in  designing  an  interactive  programming  system  is  determining 
how  to  get  efficiency  and  flexibility,  two  opposing  constraints,  to  co-exist. 
His  solution  was  to  build  an  interpreter/compiler  system.  In  such  a  system,  a 
program  is  partially  interpreted  (to  provide  flexiblity  for  the  user)  or 
compiled  (to  provide  efficient  use  of  the  computer)  depending  on  its  usage 
and  constancy  over  some  period  of  time.  We  see  no  conceptual  problem  in 
incorporating  dynamic  optimization  into  such  a  system  in  order  to  further 
improve  efficiency.  All  that  we  would  be  doing  is  replacing  the  mechanism 


116 


that  controls  the  compilation  of  code  with  a  more  refined  one.  Whenever 
changes  are  made  to  the  program,  the  internal  form  used  by  the  interpreter 
could  be  regenerated  for  those  sections  of  the  program  affected,  and  their 
optimization  state  reset  so  they  would  be  executed  interpretively  As  the 
program  executes,  these  code  sections  would  again  be  dynamically  optimized. 

In  summary  then,  our  test  results  indicate  that  the  adaptive  pro'.ess  is 
a  worthwhile  and  promising  technique.  As  our  understanding  of  program 
behavior  increases  and  our  programming  styles  become  more  formalized,  it 
may  turn  out  to  be  one  of  the  most  sensible  approaches  for  designing 
compiler  systems. 
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Appendix  A 
The  Compiled  Code 


To  aid  in  syntax  analysis,  optimization  and  code  generation,  the  compiler 
translates  the  source  code  into  an  internal  form.  A  number  of  internal  forms 
are  possible:  Polish  notation,  quadruples,  triples,  indirect  triples  or  trees 
(cf.  Gries  [Gri71]).  Optimizing  compilers  have  been  built  using  different 
internal  forms,  viz.,  FORTRAN  IV  H  [Low69]  uses  quadruplets, 
FORTRAN  II  [AII69]  uses  indirect  triples,  and  BLISS  [Bli71]  uses  trees.  Which 
form  to  use  is  a  matter  of  taste. 

The  adaptive  FORTRAN  compiler  uses  two  internal  forms.  The 
compile-time  internal  form  is  Polish  postfix  which  is  used  for  syntax  analysis 
and  code  generation.  The  run-time  internal  form  is  the  generated  code  and 
consists  of  quadruplest,  or  quads  for  short.  This  form  is  an  expanded 
version  of  the  smaller  and  more  consise  source  code  in  which  language 
constructs  (e.g.,  DOs,  IFs,  subscripts,  tests)  are  expressed  as  basic  operations. 

A.l  Quadruples 

There  are  a  number  of  reasons  why  quads  were  selected  as  the 
run-time  internal  form.  The  main  reason  was  that  they  were  a  convenient 
form  that  could  be  efficiently  processed  by  the  optimizers  and  executed 

t  Also  known  as  three  address  code. 


interpretively.  Other  reasons  were: 

1)  A  quad  is  self  contained,  i.e.,  it  is  not  necessary  to 

reference  the  result  of  another  quad  when  processing 

its  arguments. 

2)  Quads  appear  in  the  order  in  which  they  are  to  be 

executed. 

3)  Functions  will  Know  precisely  where  to  return  their 

results. 

For  a  single  binary  operator,  quads  have  the  form: 

(OP,  ARGi,  ARG2,  ARG3) 

where  ARGi  and  ARG2  specify  the  operands,  ARG3  the  result  temporary,  and 
OP  the  operation  to  be  performed.  Not  all  operations  require  three 

arguments;  some  require  one  (e.g.,  branches)  while  others  two  (e.g., 
conversions  of  type  and  unary  operators).  As  a  convention,  unused  positions 
of  a  quad  are  left  blank. 

A.2  Code  Generated  for  each  FORTRAN  Construct 

The  adaptive  FORTRAN  compiler  is  one  pass  and  generates  relocatable 
interpretive  codefi.e.,  quads)  directly  to  core.  If  the  program  contains  no 
errors,  the  relocatable  code  is  loaded  by  a  fast  loader  which  maps 
relocatable  addresses  into  absolute  addresses  and  allocates  data  storage. 

The  generated  cede  for  some  of  the  FORTRAN  constructs  is  strictly 
quads  (e.g.,  arithmetic  operations);  others  are  a  combination  of  quads  and 
machine  language  (e.g.,  calls  to  mathematical  functions);  while  others  are  pure 
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machine  language  (e.g.,  I/O).  Thus  when  the  program  is  loaded,  the 
instruction  storage  consists  of  quads  with  possible  embedded  machine  language 
and  pure  machine  language  compiled  out  of  sequence. 


The  descriptions  of  the  generated  code  given  belov  use  the  following 
programming  conventions: 

1)  PDP  10  machine  language  is  represented  in  MACRO-10 
assembly  language  (cf.  [PDP71a]). 

2)  A  colon  following  a  symbol  indicates  the  symbol  is  a 
label. 

3)  The  character  V  preceding  a  symbol  indicates  indirect 
addressingt. 

4)  A  period  following  a  symbol  indicates  it  represents  a 
FORTRAN  UUO  (cf.  [PDP71b]),  i.e.,  a  call  on  the  FORTRAN 
run-time  support  system. 

5)  A  period  represents  the  current  address. 

6)  For  arithmetic  operations,  the  basic  mnemonic  has  a 
single  letter  prefix  to  indicate  the  arithmetic  mode: 

a)  no  prefix  -  integer  OP 

b)  F  -  floating  point  OP 

c)  D  -  double  precision  OP 

d)  C  -  complex  OP 

e)  L  -  logical  OP 

f)  S  -  string  OP 

A  complete  list  of  the  OP  mnemonics  is  given  in  Table 
A.1  along  with  a  brief  description. 

7)  The  meta-language  variables  used  in  the  syntactic  forms 
aro  the  same  as  those  used  in  the  American  Standard 
Fortran  Report  [ASF66],  Their  meanings  are  generally 
obvious  from  context. 


t  MACRO-10  uses  a  ’o’  symbol  instead,  a  convention  we  will  not  follow. 
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fe)  The  subscripted  letter  T  as  an  argument  of  a  quad 
represents  a  temporary. 

9/  addr(v)  represents  the  address  of  v. 

10)  Formatted  data  words  are  specified  by  the  pseudo-ops 
DATA,  DESC1,  DESC2,  DESC3  and  TEXT.  Their  internal 
representations  are  given  in  Figure  A.l. 

A.  2.1  Expressions 

A.  Aritmetic 

a)  Binary  operator 

FORM:  ei  <bop>  e2 

CODE:  (OP,  ei,  e2,  T) 

where  <bop>  +|-|*|/|** 

The  OP  mnemonics  can  be  found  in  Table  A.l. 

b)  Negation 

FORM:  -e 

CODE:  (NEG,  e,  ,  T) 

B.  Relational 

FORM:  Bj  <rop>  e2 

CODE:  (OP,  ej,  e2,  T) 
where  <rop>  .LT.|.LE.|.EQ.|.NE.|.GE.|.GT. 
i  he  OP  mnemonics  can  be  found  in  Table  A.l. 

C.  Logical 


a)  Binary  operator 

FORM:  ei  <lop>  e2 
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CODE:  {OP,  ei,  e2,  T) 

where  <lop>  ::-  .AND.  |.OR.  |.X0R.  |.EQV. 

The  OP  mnemonics  can  be  found  in  Table  A.l. 
b)  Unary  .NOT. 

FORM:  .NOT.  e 

CODE:  {NOT,  e,  ,  T) 

A.2.2  Assignment  Statement 

FORM:  vi  -  v2  *  ...  ■  vn  ■  e 
CODE:  (REPL,  e,  ,  Vj)  j-l,...,n 

A.2.3  Control  Statements 

A.  GO  TO  statements 

a)  Unconditional 

FORM:  GO  TO  k 

CODE:  {B,  *W,  ,  ) 

k':  DATA  addr{k)  {into  data  storage) 

b)  Assigned 

FORM:  GO  TO  v 

CODE:  {B,  *v,  ,  ) 

c)  ASSIGN  statement 

FORM:  ASSIGN  k  TO  v 


where  v  is  a  simple  variable. 
CODE:  (REPL,  k,  ,  v) 
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B.  IF  statement 
a)  Arithmetic 

FORM:  IF(e)ki,k2,k3 

where  k,  is  a  statement  label  or  assigned  variable. 

1)  ki^k2/k3 

CODE:  (BLZ,  e,  *k1',  ) 

(BEZ,  e,  *k2',  ) 

(Bi  *k3',  ,  ) 
ki':  DATA  addr(ki) 

k2':  DATA  addr(k2>  (into  data  storage) 

k3':  DATA  addr(k3) 


2)  k1=k2 

CODE:  (BGZ,  e,  *k3',  ) 

(B,  *ki',  ,  ) 

3)  ki«k3 

CODE:  (BEZ,  e,  *k2',  ) 

(B,  *k1,)  ,  ) 

4)  k2"k3 

CODE:  (BLZ,  e,  ak^,  ) 

(B,  *k2',  ,  ) 

b)  Logical 

FORM:  IF(e)S 

CODE:  (BF,  e,  L,  ) 

{code  for  S) 

L:  I 

1)  S  is  GO  TO  k 

CODE:  (BT,  e,  *k',  ) 
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2)  G  is  STOP 


COOE:  (STOPT,  e,  ,  ) 


3)  S  is  RETURN 


CODE:  (EXTFT/EXTST,  e,  ,  ) 


C.  Subprogram  call 


a)  Subroutine 


1)  FORM:  CALL  s 


CODE:  (CALLS,  *s',  LI,  ) 

LI:  DATA  0 
s':  DATA  addr(s) 


(out  of  sequer>.e) 
(into  data  storage) 


2)  FORM:  CALL  s(a1,a2,..,an) 


CODE:  (CALLS,  *s\  LI,  ) 

LI:  DATA  n 

OESC1  TYj,AR1,Li,addr(a1) 


DESC1  TYn,AR(1,Ln,addr(an) 


where  TY,  is  the  type  of  a,  (see  Table  A.2), 

AR,  is  the  arithmetic  of  aj  (see  Table  A.3), 
Lj  is  the  class  of  aj  (see  Table  A.4). 


b)  Function 


FORM:  f(ai,a2,...,an) 


CODE:  (CALLF,  *f',  LI,  T) 

LI:  DATA  n 

DESC1  TY1(ARi,Li,addr(ai) 


DESC1  TYr,ARmLn,addr(an) 


where  T  is  the  temporary  storage  location  where  the 
functional  value  is  to  be  returned. 


c)  Basic  external  library  function 

FORM:  xlf(ai an) 

CODE:  (XCT,  xlf  ,  ,  T) 

ARG  <type  code  of  ai>,addr(ai) 


ARG  <type  code  of  an>,addr(an) 

where  ARG  has  the  same  format  as  I/O  UUO’s  (see 
Sec.  A.2.4).  T  is  the  temporary  storage  location 
where  the  functional  value  is  to  be  returned. 

D.  RETURN  statement 

FORM:  RETURN 

a)  Subroutine 

CODE:  (EXITS,  ,  ,  ) 

b)  Function 

CODE:  (EXITF,  fv,  ,  ) 

where  fv  is  the  address  of  the  functional  value. 

E.  DO  statement 

FORM:  DO  k  v=ei,e2,e3 

where  v  is  a  simple  variable, 

ej  are  arithmetic  expressions  which  are  converted 
to  the  type  of  v.  63  may  be  omitted,  in  which 
case  it  is  1. 

a)  e3  not  a  constant 

CODE:  (REPL,  ei,  ,  v) 

LI:  {range  of  DO} 

(ADD,  v,  e3,  v) 

(SUB,  v,  e2,  Tj) 

(NEGL,  Tlt  e3,  T2) 

(BLEZ,  T2,  LI,  ) 
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b)  63  a  conctant 

1)  e3>0 

CODE:  (REPL,  ei,  ,  v) 

LI:  {range  of  DO} 

(ADD,  v,  e3,  v) 

(BLE,  v,  e2,  LI) 

2)  e3<0 

CODE:  (REPL,  ei,  ,  v) 

LI:  {r;nge  of  DO} 
f  ADD,  v,  e3,  v) 

(BGE,  v,  e2,  LI) 

F.  CONTINUE  statement 

FORM:  CONTINUE 

CODE:  none 

G.  END  statement 

FORM:  END 

CODE:  (STOP,  ,  ,  )  (only  for  the  main  program) 

A.2.4  I/O  Statements 

I/O  is  performed  by  the  PDP-10  FORTRAN  I/O  package.  Since  this 
section  of  code  is  fixed,  it  can  not  be  optimized  at  run  time.  Hence  I/O  time 
is  constant  regardless  of  the  optimizations  made  to  the  code.  It  would  be 
wasteful  (timewise)  to  have  to  transform  interpretive  I/O  code  to  machine 
language.  Therefore,  the  compiled  code  is  identical  to  that  produced  by  the 
PDP-10  FORTRAN  compiler,  F40.  For  a  description  of  the  FORTRAN  UUO’s  IN., 
O'JT.,  DATA.,  SL'ST.  and  FIN.,  and  ARG  and  type  codes,  see  the  PDP-10 


FORTRAN  handbook  [PDP716]. 


In  what  follows,  the  code  is  given  in  MACRO-10  format.  Also,  R0 
R1  represent  machine  renters  0  and  1  respectively. 

A.  Initialization 

CODE:  MOVE  Rl,<format  pointer> 

a)  Input 

FORM:  READ  f.list 

READ  f 
READ{u,f)list 
REAC(u,f) 

READ(u,f,END«c)list 

READ(u,f,ERR=d)list 

READ(u,f,END=c,ERR=d)list 

CODE:  IN.  Rl,<unit  number> 

or  MOVE  R0,<integer  variable> 

HRRM  R0,  .+1 

IN.  R1,0 

(if  ERR  or  END  specified) 

MOVE  R0,<label  pointer> 

HRRM  R0,*<END/ERR> 

where  END/ERR  are  cells  containing  the  address  of 

END.  and  ERR.,  the  cells  used  by  the  I/O 
package. 

b)  Output 

FORM:  PRINT  f,list 

PRINT  f 
TYPE  f,list 
TYPE  f 
WRITE(u,f)list 
WRITE(u,f) 


CODE: 

or 


OUT. 

MOVE 

HRRM 


Rl,<unit  number> 
R0,<integer  variable> 
R0,  >1 
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B.  Data  transmission  and  I/O  lists 
FORM:  E1,E2,..,En 

where  r,  can  be  a  simple  variable,  subscripted  arable, 

expression  or  array  name,  but  not  a  DO-if'  >ed  list. 

a)  Simple  variable,  constant,  or  expression  (result) 

1)  non-parameter 

CODE:  DATA.  <type  code>,<variabWconstant/result> 

2)  parameter 

CODE:  DATA.  <type  code>, ^parameter  cell-' 

b)  Array 

1)  non-adjustable  dimensions 

CODE:  SUST.  <type  code>,<base  address  of  array> 

ARG  0,<number  of  elements> 

2)  adjustable  dimensions 

CODE:  MOVE  R0,addr(DVEC)+n+l 

HRRM  R0,  .+2 

SUST.  <type  code>,*<parameter  cell> 

ARG  0,0 

where  DVEC  is  the  array’s  dope  vector  (see  Sec.  A.2.5). 

c)  Subscripted  variable 

CODE:  DATA.  <type  codeV<temp  storage  cell> 

where  <temp  storage  cel^  contains  the  address  of  the  array 
element. 

C.  Termination 


CODE:  FIN.  0,0 


"n  " 


■• 11 1111  ■  MV  »  M  .1.)  IIUIW1  ».  fUL  »■  .1  ..mi  II 
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Since  the  I/O  code  is  in  machine  language,  it  cannot  be  mixed 

with  the  interpretive  code.  Therefore  it  is  compiled  out  of  sequence 

under  a  different  relocation  base.  In  order  to  execute  it,  it  is  made 

into  a  subroutine: 

<1/0  routines  BYTE  0 

{I/O  code} 

JRST  2,*<l/0  routine> 

To  execute  the  routine,  the  following  quad  is  compiled: 

(JSR,  <1/0  routine>,  ,  )  . 

The  effect  of  the  JSR  is  the  execution  of  the  machine  language 
instruction: 

JSR  0,<I/O  routine>  . 

When  the  JSR  quad  is  transformed  to  machine  language,  the  JSR 
machine  language  instruction  is  generated. 

D)  FORMAT  statement 

FORM:  k  F0RMAT(Si,S2,...,S„) 

CODE: 

k':  DATA  addr(k)  (into  data  storage) 

k:  TEXT  '(Si,S2i-»Sn)’  (out  of  sequence) 
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A.2.5  Array  Declarations 


An  array  declarator  may  appear  in  a  DIMENSION  statement,  type 
declaration  or  COMMON  statement. 

FORM:  v(di,d2,...,dn) 

where  dj  are  integers  or  simple  integer  variables, 
n  is  the  dimension  of  the  array. 

CODE:  (generated  for  arrays  with  adjustable  dimensions) 

(PUSHJ,  ADEC,  ,  ) 

DATA  n 

DATA  addr(DVEC) 

DESC2  Ri,addr(di) 


DESC2  Rn,addr(dn) 

where 

a)  ADEC  is  the  run  time  array  declaration  routine  which 
generates  the  dummy  array’s  dope  vector,  DVEC.  The  dope 
vector  has  the  form: 

DVEC:  DATA  FUDGE 

DATA  D2 


DATA  D„ 

DATA  SIZE  (number  of  elements) 

b)  Rj  is  the  reference  of  dj  (see  Table  A.5). 

c)  If  v  is  a  dummy  parameter,  its  value  will  be  set  by  the 
run-time  routine  PSA  and  depends  on  the  corresponding 
actual  parameter.  If  the  actual  parameter  is: 

1)  a  subscripted  variable,  PSA  stores  the  address 
of  this  element  into  v, 
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2)  an  array  name,  PSA  stores  the  BA$EV  of  the 
array  into  v.  If  the  array  name  is  not  itself  a 
parameter,  its  descriptor  to  PSA  contains  the 
BASEyi  if  a  parameter,  the  parameter  cell 
contains  the  BASEV. 


A.2.6  Array  References 

References  to  array  elements  must  contain  the  number  of  subscripts 
that  corresponds  to  the  number  of  dimensions  declared  for  the  array. 
Element  v(ei,e2r..,en)  is  at  location: 

BASEV  ♦  (e1*Di+...+e„*Dn)  +  FUDGEy  (1) 

where 

a)  BASEy  is  the  address  of  the  first  element  of  the  array  v 
which  has  ei*...*en  elements. 

b)  Dj  is  defined  recursively  as  follows: 

Di  -  1 

Dj  -  ej.i*Dj.i 

c)  FUDGEy  -  -(Di+...+D„) 

A)  Array  with  non-adjustable  dimensions 

In  this  case,  all  the  information  necessary  to  evaluate 
(1)  at  compile-time  is  stored  in  the  dictionary  along  with  the 
array's  data  descriptor. 

1)  Array  not  a  dummy  parameter 
n-1  (ADD,  ej,  Fv,  Ti) 
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n>l  (MPY,  e2,  D2 1  Ti) 

(ADD,  Ti,  ei,  T2 ) 
(MPY,  eg,  Dg,  Tg) 
(ADD,  Tg,  T2,  T4) 


(MPV,  em  D„  T2n-3) 

(ADD,  T2n_3,  l2n.  4,  T2n_2) 
(ADD,  T2n.2,  Fv,  T2n.j) 

where  Fv  =  BASEV  +  FUDGEV 

2)  Array  a  dummy  parameter 

Replace  the  last  instruction  above  with: 


(ADD,  T2n_2/ei,  T2n-i) 

(ADD,  T2n.lt  FUDGEV,  T2n) 

where  v  is  the  dummy  parameter  whose  value  is  the 
BASE  of  the  actual  parameter, 

FUDGEv  is  the  FUDGE  for  the  dummy  array 
parameter  calculated  from  its  declaration  at 
compile  time. 

B)  Array  with  adjustable  dimensions 


n-1 

(ADD, 

v,  DVEC,  Ti) 

(ADD, 

Tii  ei»  T2) 

n>l 

(ADD, 

v,  DVEC,  Ti) 

(ADD, 

Tii  ei»  T2) 

(MPY, 

e2,  DVEC+1, 

t3) 

(ADD, 

T3,  T2,  Ja) 

(MPY, 

en,  DVEC+n-1 

1  T2n-i) 

(ADD, 

T2n-1»  T2n.2, 

T2n) 

where  v  is  the  dummy  parameter  whose  value  is  the 
BASE  of  the  actual  parameter. 
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A.2.7  Subprograms 

A.  FUNCTION  Subprograms 

FORM:  t  FUNCTION  f(ai,a2,..,an) 

where  t  is  optional  and  can  be  INTEGER,  REAL  or  LOGICAL, 
a j  is  a  dummy  parameter. 

Functions  must  have  at  least  one  dummy  parameter.  A 
RETURN  statement  must  be  supplied.  The  name  of  the  function  is 
treated  as  a  scalar  variable  for  stc.  >g  the  value  of  the  function. 
Storage  for  the  functional  value  is  allocated  as  for  normal  scalars. 

Functions  are  referenced  within  expressions  and  return  a 
value.  The  code  generated  for  a  function  reference  is  given  in 
Section  A.2.3. 

B.  SUBROUTINE  Subprograms 

FORM:  SUBROUTINE  s 

or  SUBROUTINE  s(ai,a2r..,an) 

where  a,  is  a  dummy  parameter. 

C.  Code  generated  for  a  subprogram  definition 

CODE:  (PUSHJ,  PSA,  ,  ) 

DATA  n 

DESC3  TYi.ARi.psij 

DESC3  TYn,ARn,psin 

where  PSA  is  the  run  time  parameter  assignment  routine, 

TYj  is  the  type  of  aj  (see  Table  A.2), 

ARj  is  the  arithmetic  of  aj  (see  Table  A.3), 
psij  is  the  parameter  storage  index  for  aj. 
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PSA  matches  the  actual  parameters  with  the  formal 
parameters.  Since  all  parameters  are  call  by  address,  no  conversion 
of  type  is  possible.  Therefore  arithmetics  must  match.  Using 
{TY,AR,L}  in  the  subprogram  reference  and  {TY,AR}  in  the 
subprogram  definition,  PSA  calculates  the  address  of  the  actual 
parameter  and  inserts  it  into  the  corresponding  psi.  Thus, 
references  to  the  actual  parameter  is  indirect  through  its  psi. 


The  function  of  each  tag  bit  is: 

1)  Indirect  addressing  indicators  I1-I3 

If  lj  is  0,  ARGj  is  the  address  of  the  operand;  if  lj 
is  1,  ARGj  contains  the  address  of  the  operand  (indirect 
result  or  parameter). 

2)  Temporary  address  indicators  T1-T3 

Tj  is  1  if  ARGj  is  the  address  of  a  temporary; 
otherwise  0.  These  indicators  exist  for  efficiency 
purposes.  Temporarys  are  the  most  heavily  processed 
entities,  and  even  though  it  is  possible  at  run  time  to 
determine  if  an  address  represents  a  temporary,  to  do 
so  would  increase  the  processing  overhead  needlessly. 

3)  Constant  address  indicators  C1-C2 

Cj  is  1  if  ARG,  is  the  address  of  a  constant.  Again 
these  indicators  exist  for  efficiency  purposes.  They  aid 
the  machine  language  generators  in  determining  if  it  is 
possible  to  use  an  "immediate"  instruction. 

4)  Branch  type  indicator  BTY 

Set  whenever  the  branch  is  translated  or 
retranslated  to  machine  language  to  insure  the  proper 
code  is  generated  (see  Section  3.4).  This  tag  bit  is 
applicable  only  to  branch  instructions.  Basically  it  is 
used  to  distinguish  whether  the  branch  is  to  a  basic 
block  that  is  external  or  internal  to  the  segment 
containing  it.  It  must  be  updated  whenever  new 
segments  are  formed  or  optimizations  applied. 

5)  Store  result  temporary  indicator  SR 

If  SR  is  1,  the  machine  language  generator  compiles 
a  store  instruction  to  force  the  storing  of  the 
temporary’s  associated  register  into  the  icmporary.  This 
is  necessary  when  a  temporary  is  referenced  in  machine 
language  generated  by  the  compiler.  This  machine 
language  is  never  altered,  and  consequently  when  the 
quad  is  translated  to  machine  language,  its  result  must 
be  stored  in  the  result  temporary. 
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Table  A.l  The  List  of  Quad  OP 


Octal 

Mnemonic 

Description 

000 

NOP 

No  operation 

001 

ADD 

Integer  add 

0O2 

FADD 

Floating  add 

003 

DADD 

Double  precision  add 

004 

CADD 

Complex  add 

005 

SUB 

Integer  subtract 

006 

FSUB 

Floating  subtract 

007 

'  DSUB 

D.P.  subtract 

010 

CSUB 

Co  iplex  subtract 

Oil 

MPY 

Integer  multiply 

012 

FMPY 

Floating  multiply 

013 

DMPY 

D.P.  multiply 

014 

CMPY 

Complex  multiply 

015 

DIV 

Integer  divide 

016 

FDIV 

Floating  divide 

017 

DDIV 

D.P.  divide 

020 

CDIV 

Complex  divide 

021 

FXFX 

Integer  to  integer  power 

022 

FLFL 

Floating  to  floating  power 

023 

DPDP 

D.P.  to  D.P.  power 

024 

CXCX 

Complex  to  complex  power 

025 

FLFX 

Floating  to  integer  power 

026 

DPFX 

D.P.  to  integer  power 

027 

CXFX 

Complex  to  integer  power 

030 

NEG 

Integer  negate 

031 

FNEG 

Floating  negate 

032 

DNEG 

D.P.  negate 

033 

CNEG 

Complex  negate 

034 

LREPL 

Logical  replacement 

035 

SREPL 

String  replacement 

036 

REPL 

Integer  replacement 

037 

FREPL 

Floating  replacement 

040 

DREPl 

D.P.  replacement 

041 

CREPl 

Complex  replacement 

042 

AND 

Logical  and 

043 

NOT 

Logical  not 

044 

OR 

Logical  or 

045 

XOR 

Logical  exclusive  or 

046 

EQV 

Logical  equivalence 

047 

oEQ 

String  = 

050 

EQ 

Integer  - 

051 

feq 

Floating  - 

052 

DEQ 

D.P.  - 

codos 


ARG3*-ARGi  +ARG2 


ARG3**ARGi“ARG2 


ARG3«-ARGi  *  ARG2 


ARG3«-ARG!/ARG2 


ARG3<-ARGi**ARG2 


ARG3« — ARGi 


ARG3<-ARGi 


ARG3<-ARG!AARG2 

ARG3*-«ARGi 

ARG3«-ARGivARG2 

ARG3«-ARGi  xor  ARG2 

ARG3*-ARGi»ARG2 

ARG3*-(ARGi-ARG2) 
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Table  A.l  (coiu ) 


Octal 

Mnemonic 

Description 

053 

CEQ 

Complex  *= 

054 

SNE 

String  / 

ARG3MARG1MRG2) 

055 

NE 

Integer  H 

056 

FNE 

Floating  / 

057 

ONE 

D.P.  / 

060 

CNE 

Complex  / 

061 

SGT 

String  > 

ARG3<-(ARGi>ARG2) 

062 

GT 

Integer  > 

063 

FGT 

Floating  > 

064 

DGT 

D.P.  > 

065 

SGE 

String  > 

ARG3«-(ARGi*ARG2> 

066 

GE 

Integer  i 

067 

FGE 

Floating  > 

070 

DGE 

D.P.  > 

071 

SLT 

String  < 

ARG3«-(  ARGj  <  ARG2 ) 

072 

LT 

Integer  < 

073 

FLT 

Floatno  < 

074 

DLT 

D.P.  < 

075 

SLE 

String  < 

ARG3H  ARGj^  ARG2 ) 

076 

LE 

Integer  < 

077 

FLE 

Floating  S 

100 

DLE 

D.P.  < 

101 

MOD 

Integer  mod 

ARG3«-ARGi  mod  ARG2 

102 

AMOD 

Floating  mod 

103 

ISIGN 

Integer  sign 

ARG3«-sgn(ARG1)*|ARG2l 

104 

SIGN 

Floating  sign 

105 

CoIGN 

D.P.  sign 

106 

IAB$ 

Integer  abs 

ARG3«-|ARGil 

107 

ABS 

Floating  abs 

110 

DABS 

D.P.  abs 

111 

CABS 

Complex  abs 

112 

INT 

Real  to  integer  truncation 

ARG3<-sgnARGi*enti«r  |  ARGi 

113 

AINT 

Real  to  real  truncation 

114 

IDINT 

D.P.  to  integer  truncation 

115 

IFIX 

Real  to  integer  conversion 

ARG3«-»nti«r  ARGi 

116 

FLOAT 

Integer  to  real  conversion 

117 

cvsi 

String  to  integer  conversion 

120 

CVSR 

String  to  real  conversion 

121 

CVSD 

String  io  D.P.  conversion 

122 

CVSC 

String  to  complex  conversion 

123 

B 

Branch  to  ARGi 

124 

BGZ 

Branch  to  ARG2  if  ARGi>0 

125 

BF 

Branch  to  ARG2  if  ARGi»fals« 

■  ^ .  .  ... - •  -  — A— -W..  -  .i:-..  .....  ....,  ■>■■■..  i,.!,,....  .I-.-^-  -  -  •'  —  •■» 
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Octal 


126 

127 

130 

131 

132 

133 

134 

135 

136 

137 
140 


141 

142 

143 


144 


145 

146 

147 

150 

151 


Mnemonic 


BLZ 

BEZ 

BLEZ 

STOP 

NEGL 

FNEGL 

DNEGL 

EXITS 

BLE 

BGE 

CALLF 


EXITF 

JSR 

XCT 


PUSHJ 


JUMP 

CALLS 

STOPT 

EXTST 

EXTFT 


Table  A.l  (cont.) 
Description 


Branch  to  ARG;>  if  ARGi<0 

Branch  to  ARG2  if  ARGi»0 

Branch  to  ARG;>  if  ARGi<0 

Stop  execution 

Integer  conditional  negate  ARG3*-if  AR32<0  then 

-ARGi  else  ARGi 

Floating  conditional  negate 

D.P.  conditional  negate 

Return  from  subroutine 

Branch  to  ARG3  if  ARGi<ARG2 

Branch  to  ARG3  if  ARGi*ARG2 

CaJ  the  function  at  ARGi.  ARG3  is  the  temporary 

for  the  functional  value.  ARG2  is  the  address  of 

the  formal  parameter  descriptor  list. 

Return  from  function.  ARGi  contains  the  functional 
value. 

Simulate  PDP-10  JSR  instruction.  The  routine  is  at 
ARGi  (used  to  call  I/O  subroutines). 

Simulate  PDP-10  XCT  instruction.  Instruction  to  be 
executed  is  at  ARGi  (used  to  call  external  library 
functions).  ARG3  is  the  functional  result. 

Simulate  PDP-10  PUSHJ  instruction.  The  stack  used 
is  the  BLISS  run  time  stack  [Bli71].  The  routine  to 
be  called  is  at  ARGi  (used  to  call  the  run-time 
support  routines  ADEC  and  PSA). 

Branch  to  ARGi  (marks  end  of  a  basic  block) 

Call  subroutine  at  ARGi.  ARG2  is  the  address  of 
the  formal  parameter  descriptor  list. 

Stop  execution  if  ARGi -true 

Return  from  subroutine  if  ARGi -true 

Return  from  function  if  ARGi -true.  ARG2  contains 

the  functional  value. 

Branch  to  ARG2  if  ARGi -true 


152  BT 


138 


Table  A.2 

Operand  Type  (TY) 

Octal 

Type 

00 

simple  variable 

10 

array  with  non-adjustable  dimensions 

11 

array  with  adjustable  dimensions 

20 

function  subprogram 

21 

subroutine  subprogram 

22 

library  subprogram 

23 

external  subprogram 

Table  A.3  Operand  Arithmetic  (AR) 
Octal  Arithmetic 


0 

1 

2 

3 

4 

5 

6 


universal 

logical 

string 

integer 

real 

double  precision 
complex 


Table  A.4  Operand  Class  (L) 
Octal  Class 


1 

2 

3 

4 

5 


identifier 

constant 

result 

indirect  result 
parameter 


Table  A.5  Operand  Reference  (R) 
Octal  Reference 


0  normal  variable 

1  COMMON  variable 

2  parameter 


140 


Appendix  6 

Source  Listings  of  the  Test  Programs  and  a  Detailed  Example 


This  appendix  contains  the  source  listings  of  all  the  test  programs  used 
for  validation  of  the  system,  along  with  the  complete  system  output  for  a 
matrix  multiplication  program.  This  detailed  example  is  the  same  as  that  used 
by  Allen  [AII69]. 

B.l  A  Detailed  Example:  Matrix  Multiplication 


A)  The  Source  Listing 


1.  INTEGER  X(50,50),  Y(50,50),  Z(50,50) 

2.  C  INITIALIZE  X  AND  Y 

3.  DO  10  1*1,50 

4.  DO  10  >1,50 

5.  X{I,JW+J 

6.  Y(l,J)-MOD(l,J) 

7.  10  CONTINUE 

8.  DO  3  1-1,50 

9.  DO  3  >1,50 

10.  Z(l,J)«=0 

11.  DO  3  K-1,50 

12.  3  Z{I,J)-Z(I,J)+X(I,K)*Y(K,J) 

13.  TYPE  20,X,Y,Z 

14.  20  FORMAT  (5(515/)/) 

15.  STOP 

16.  END 


B)  Listing  of  cource  with  Interpretive  Code 


Af  OR TRAN 

VERSION:  110172 

12  4/72  P3/4 

PAGE  1-1 

03 

00 

00 

00  000000 

7/7777777/77 

OATA 

TRUE 

03 

00 

00 

00  000C0I 

ooooooococco 

OATA 

r*isc 

Ml 

►  ••  * 

If*  « 

101 0CK  1 

63 

Cl 

00 

00  OOCCOI 

000000000000 

OATA 

eoooooooooeo 

00!  00 

1. 

INTEGER 

x(50.50).  Y150,5C>,  715050) 

00?00 

2 

c 

INITIAL t/F  X  AND  V 

CO3C0 

3 

00  10 

•  1.50 

03 

00 

00 

00  C00003 

oocooo^oocoi 

OATA 

eooooooooooi 

©WOO 

4 

00  10  J 

'•1.5© 

03 

00 

00 

00  000004 

OOC 000000067 

OATA 

000100000062 

Cl 

C3 

00 

05  000C0C 

1  *0000000003 

oooooe  oocooo 

(RE  PI 

.  oooooooooei  .  e  ,  i 

01 

cc 

oc 

00  00000? 

6740000CCC02 

eocooo  oooooo 

(JUMP 

,  000002  e  .  © 

Ml 

KM 

If  •< 

I01OCK  ? 

63 

01 

00 

00  000OC? 

000000000004 

DATA 

000000000004 

©0500 

5. 

x(um«j 

01 

03 

00 

05  000004 

17 OCC 0000003 

OCOOOO  000001 

(REPL 

,  000000000© 1  .  ©  .  J 

01 

00 

00 

OI'  000006 

674000000003 

oooooo  oooooo 

(JUMP 

,  ©00003  .  ©  © 

•  •i 

K»l 

Itll 

■9L0CK  3 

63 

01 

00 

00  000003 

000 OCOOOO© 10 

OATA 

oooooooooo i© 

03 

0? 

00 

00  000*05 

777777777717 

OATA 

00000000006 1 

01 

05 

03 

04  0CC010 

044000000001 

000004  oooooo 

(MPY 

,  J  ,  0000000005©  ,  T  ©$ 

01 

04 

05 

04  000012 

004CC0O0C0OC 

OCOOOO  OOOOOI 

(AO© 

.  T©$  ,1  ,  T  |$ 

01 

04 

03 

0.1  0000:4 

004000000001 

G00003  OC  3002 

(AOO 

.  TK  -00000000003  .  m 

©0600 

6 

y(IJ).MCC(U) 

©1 

05 

05 

04  000016 

004  0  0  000  000  0 

COOOOI  0C00C3 

(AGO 

.1  .  J  ,  T3t 

01 

04 

00 

04  C00020 

I7CCQI OCOOC3 

OOOOCO  000002 

(BE  PI 

,13*  ,0  *12* 

03 

05 

00 

00  000006 

OOOCOOC04623 

OATA 

000900004623 

91 

05 

03 

04  00002? 

044000000001 

C00004  OCOOC4 

(MPY 

,  J  .  00000000050  ,  10* 

01 

04 

05 

04  000024 

004000000004 

OCOOOO  000005 

(•DO 

,  t«»  .i  ,  is* 

c, 

04 

03 

04  00002G 

004000000005 

OOOOOO  000006 

(•00 

,  T5*  ,  00000002051  .  T6* 

©0700 

7. 

1©  CONTINUE 

©1 

05 

05 

04  000030 

404000000000 

000001  C 00007 

(MOO 

.1  ,  J  ,17* 

01 

04 

00 

04  00003? 

170001000007 

OOOOOO  000006 

(REPL 

,17*  ,0  *16* 

00800 

8 

00  3  I. 

1.50 

01 

00 

00 

CO  OOOC34 

674000000004 

oooooo  oooooo 

(JUMP 

000004  ,  ©  ,  © 

•  «l 

IMI 

Iff! 

►BLOCK  4 

63 

01 

00 

00  CO 0004 

000000000036 

OATA 

000000000036 

03 

00 

00 

00  000002 

0CC00C0CC004 

000 10 

OATA 

00  OOOOO06004 

01 

05 

03 

05  000036 

004  0  0CC00001 

00 0003  OCOOOl 

(•00 

,  J  .  00000000001  .  J 

01 

05 

03 

00  000040 

5700C0CC00CI 

000004  000003 

(BLE 

,  J  ,  ©000000005©  .  000003 

01 

00 

00 

00  00004? 

67400000C005 

oooooo  oooooo 

(JUMP 

.  000005  ,  ©  .  © 

•  •1 

MM 

»••< 

►BLOCK  5 

63 

01 

00 

00  000005 

0CC000000044 

OATA 

CO 000 ©00004 4 

01 

05 

03 

05  000034 

004000000000 

000003  oooooo 

(AOO 

,  i  ,  oeooooooeoi  ,  i 

01 

05 

03 

©0  000046 

570000000000 

000004  000002 

(BLE 

.  1  ,  ©©©©©00005©  ,  © 00002 

©1 

00 

00 

©0  000050 

624000000006 

oooooo  oooooo 

(JUMP 

.  000006  0  ,  « 

»»*i 

►  *•< 

►BLOCK  G 

63 

©1 

00 

00  OOOCC6 

00000000005? 

OATA 

000000000052 

©©900 

9 

00  3  J- 

1.50 

01 

03 

00 

05  000052 

170000000003 

oooooo  oooooo 

(REPL 

oooooooooei  ,  ©  ,i 

01 

00 

00 

00  000054 

674000000007 

oooooo  oooooo 

(JUMP 

000007  .0  ,  © 

*  *1 

Mil 

>••1 

►BIOCK  ? 

63 

©I 

©e 

00  00000/ 

COOOOOOOOC56 

OATA 

000000000  >56 

01000 

10. 

Z(U).C 

01 

03 

CO 

05  C0C056 

170000000003 

OOOOOO  OOOOOI 

(REPL 

.  00000000001  .  ©  ,  J 

01 

0© 

00 

00  000060 

674000000010 

OOOOOO  OOOOOO 

(JUMP 

,  oooei©  .  ©  ,  © 

*• 

*••1 

l*t< 

►BLOCK  8 

63 

01 

00 

00  000010 

C00000000062 

OATA 

000000000062 

03 

05 

00 

00  000010 

00000001 lr?7 

OATA 

00000001 1527 

01 

05 

03 

04  OOOOG? 

044000000001 

000004  ocooco 

(MPY 

,  J  ,  00000000050  ,  It* 

01 

04 

05 

04  000064 

004000000000 

OOOOOO  OOOOOI 

(AOO 

,  TO*  ,1  ,  It* 

01 

04 

03 

04  000066 

004000000001 

000010  000002 

(AOO 

,  19*  ,  00000000951  ,  TIM 

©1  100 

1  I. 

00  3  K« 

■1.50 

©3 

00 

00 

00  00001 1 

ooooooocoooo 

OATA 

oooocooeoeeo 

01 

03 

00 

04  000070 

17000100001 1 

OOOOOO  000002 

(REPL 

,00000000000  ,  0  ,11 IM 

©120© 

12. 

J  Z(IJ).?(IJ).X(IW.Y(«J) 

01 

03 

©0 

©5  000072 

170000000003 

OOOOOO  0165 16 

(REPL 

,  ©©eoeeooo©]  ,  ©  ,t 

ei 

00 

©0 

00  000074 

62400000001 1 

OOOOOO  oooooo 

(JUMP 

,  ooooi  i  ,o  ,o 

P3/4 
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eiaoo 


eioeo 


91599 
•  1600 


•  • 

•BLOCK  9 

63 

01 

00 

00 

0000 1 1 

000000000076 

03 

00 

00 

00 

000007 

00000000001 1 

01 

05 

03 

00 

000076 

004000000001 

OOCOO0 

OOOOOi 

01 

00 

03 

00 

OCOIOO 

004000000000 

000000 

000C5 

01 

00 

CJ 

00 

000102 

0000000^0001 

000010 

00000. 

01 

05 

03 

00 

000100 

00400000000 1 

000000 

000003 

01 

00 

05 

00 

000106 

000000000003 

OOOOPO 

000000 

01 

00 

03 

00 

000110 

000000000000 

000010 

000Cu5 

01 

05 

03 

00 

0001 1? 

00  4  000016516 

000004 

000006 

01 

00 

05 

00 

0001 10 

CO0OOOOOCOCC 

occoco 

000007 

01 

00 

03 

00 

0001 16 

000000000007 

0CC005 

0000 10 

13. 

TYPE 

?oxy; 

ei 

05 

03 

00 

000120 

000 ooooooooi 

OOCOO0 

00001 1 

01 

00 

05 

00 

000122 

000  0000000  1 

016516 

OCOO 1 ? 

01 

00 

03 

00 

000124 

0C40000C00I? 

000006 

000013 

01 

00 

00 

00 

000126 

004006 OCOO 10 

000013 

000010 

01 

00 

00 

00 

000130 

000000000005 

OCOO  10 

000015 

01 

00 

00 

00 

00013? 

17000 10000 15 

000000 

000002 

01 

05 

03 

05 

000130 

OM0000165I6 

000003 

0)6516 

01 

05 

03 

00 

000136 

5700000i6516 

000000 

0000 1 1 

01 

oo 

00 

00 

000100 

620000000012 

000000 

000000 

•>l 

•  >■1 

i«*i 

•BLOCK  10 

63 

01 

00 

00 

00001? 

000000000102 

01 

0* 

03 

05 

00010? 

004000000001 

000003 

OOOOOI 

01 

05 

03 

00 

000 i 00 

570000000001 

000000 

000010 

01 

00 

00 

00 

000106 

624000000013 

0000C3 

000000 

•  •1 

mi 

IMI 

•BLOCK  11 

63 

01 

00 

00 

000013 

000000000150 

01 

05 

03 

05 

000150 

000000000000 

000003 

cooooo 

01 

05 

03 

00 

000152 

57000000000/ 

000000 

000007 

01 

00 

00 

00 

000150 

C20OOOOOOO10 

000009  000000 

•  •1 

•  *»l 

•BLOCK  12 

63 

01 

00 

00 

000010 

000C000001 56 

02 

00 

00 

00 

000000 

000000000000 

02 

63 

06 

00 

eeoooi 

200000000012 

02 

00 

00 

00 

000002 

OI7O0O777777 

10. 

20 

F  OHMAT  (5(515/)/) 

02 

05 

00 

00 

000003 

025000000002 

©2 

00 

00 

00 

000000 

32000000a 7oa 

62 

03 

00 

00 

000005 

025000000706 

02 

00 

00 

00 

000006 

320000000700 

02 

05 

00 

00 

000007 

02500001 16 12 

02 

00 

00 

00 

oooo ie 

320000004  700 

02 

00 

00 

00 

0C00I1 

07 1 000000000 

02 

0? 

00 

00 

00001? 

250 120000 

01 

02 

eo 

00 

000156 

6100000 

^0 

000000 

03 

02 

00 

ee 

000012 

oocooc 

02 

Of 

00 

ee 

000013 

?015?5O'JK_ 

02 

00 

00  00 

000010 

325365127522 

IB. 

STOP 

16. 

END 

01 

00 

00 

00 

000160 

500000000000 

000000 

000000 

01 

00 

00 

00 

00016? 

620000000015 

000000 

000000 

■  ■■ 

BLOCK  13 

63 

01 

00 

00 

000015 

000000000160 

01 

00 

00 

ee 

000160 

500000000000 

000000 

000000 

01 

00 

00 

00 

000166 

620000000000 

000000 

000000 

01 

00 

00 

66  666176 

eeeoeooooooi 

00063 


00020 


OAT  A  0000000004,76 

OATA  0060000000 1  J 


(MPY 

.  J 

.  00000900959 

.Tilt 

(ADO 

,  Tilt 

,  1 

.  T  12* 

(ADO 

.  T  l?t 

,  00000000951 

.  T  13* 

(MPY 

.  J 

,  00000090059 

.  T  14* 

(ADD 

.  T|0t 

,  1 

.  T  15* 

(ADO 

,  T  151 

,  00009000951 

.  T  16* 

(MPY 

.  K 

.  000CC000050 

,  TI7* 

(ADO 

,  T  |7S 

,  I 

.  TIB* 

(ADO 

,  TIB* 

,-99009999949 

.  T  19* 

(MPY 

.  J 

,  99090999059 

.  T29* 

(ADO 

,  T20* 

,  K 

.  T2I* 

(A  00 

,  T?  IB 

,  00000002451 

.  T22* 

(MPY 

^T  191 

,»T??t 

.  T23* 

(ADO 

*Tl6t 

,  T?3» 

.  T24* 

(REPL 

.  T?0* 

,  9 

jiT  139 

(AOO 

.  K 

,  09909099901 

.  K 

(BLE 

.  K 

,  99000090959 

.  999911 

(JUMP 

,  9000 1? 

,  9 

.  9 

DATA 

eooo oeooe 142 

(AOO 

.  J 

,  9909000000 1 

.  J 

(BLE 

,  J 

,  99009900059 

,  099919 

(JUMP 

.  000013 

,  0 

.  9 

DATA 

000000066150 

(ADD 

,  1 

,  09000099991  , 

,  1 

(BLE 

,  1 

,  00999090059  , 

999997 

(JUMP 

,  000014 

.  9 

,  9 

OATA 

000000000156 

OATA 

000000000909 

MOVE 

01,00020* 

OUT. 

01,777777 

SLIST. 

eox 

ARG 

©0,000  704 

SLIST. 

•O.Y 

ARG 

00,094704 

SLIST. 

002 

ARG 

00,000  7  00 

FIN. 

00,000000 

JRST 

©2.*  090009 

(JSR 

.  0 

,  9 

■  9 

OATA 

000099999013 

TEXT 

(SCSI 

TEXT 

5/)/) 

(STOP 

,  0  ,6 

.  9 

(JUMP 

,  600015  ,  0 

.  9 

OATA 

000660666 164 

(STOP 

,6  ,6 

,  9 

(JUMP 

.6  ,1 

.  9 

START 

666666666661 
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E)  Listing  of  Code  Optimizations 


The  program  was  constructed  so  that  the  entire  main  loop 
would  become  totally  optimized.  The  main  loop  consists  of 
statements  8  thru  12,  or  basic  blocks  6  thru  11.  The  partial  listing 
of  the  optimized  code  given  below  is  only  for  these  blocks. 


The  optimization  of  the  program  can  be  summarized  as  follows: 

1)  Fusion  of  segment  9  (basic  block  9) 

Since  the  main  loop  consists  of  three  nested 
DO  loops,  the  corresponding  segments  will  be 
optimized  in  the  order  they  are  embedded, 
starting  with  the  innermost  one.  Thus,  segment 
9  is  optimized  first,  and  since  it  is  the 
innermost  segment,  it  is  fused  to  "dumb"  code 
after  being  executed  once  interpretively.  Since 
the  segment  is  not  yet  totally  optimized,  the 
conditional  BLE  branch  to  itself  is  processed  by 
the  segment  driver  so  the  segment’s 
optimization  count  will  be  decremented. 

2)  Code  Motion  on  segment  9 

Segment  9  is  now  totally  optimized.  First 
CSE  is  performed  on  each  of  its  basic  blocks 
(one  in  this  case).  Four  redundant 
subexpressions  are  removed  from  basic  block  9: 
the  4th,  5th,  6th  and  10th  quads.  The  first 
three  represent  the  second  subscript  calculations 
for  Z(I,J),  while  the  fourth  involves  the  common 
subscript  calculation  for  J.  Also,  the  15th  quad 
is  combined  with  the  14th,  eliminating  the 
intermediate  temporary.  Then  code  motion 
results  in  three  calculations  involving  the 
segment  invariants  I  and  J  being  moved  to  the 
front  of  the  segment:  the  1st,  2nd,  and  3rd 
quads.  Unique  temporaries  are  assigned  to  the 
results  of  these  invariant  quads,  and  they 
replace  the  original  temporaries.  Thus  the 
result  of  the  1st  quad  is  used  both  in  the  2nd 
quad  and  the  11th  quad,  and  the  3rd  qu.*ds’s 
result  is  used  in  the  14th  quad.  Finally,  the 
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resulting  quads  are  compiled  to  "fair"  code. 

Notice  that  the  conditional  BLE  branch  is 
now  direct  and  to  the  alternate  entry  point  of 
the  segment,  i.e.,  to  the  point  after  the 
invariant  code. 

3)  Translating  basic  blocks  8  and  10 

The  remaining  basic  blocks  of  the  yet 
unfcrmed  segment  8  are  now  translated  to 
"dumb"  machine  language. 

4)  Fusion  of  segment  8  (basic  blocks  8-10) 

Next,  segment  8  is  formed.  The  machine 
language  is  non-homogenous  with  respect  to  the 
degree  of  optimization  performed  on  its  basic 
blocks:  the  embedded  segment  9  is  already 
totally  optimized,  while  the  rest  of  its  blocks 
have  been  translated  to  "dumb"  code.  Since 
the  machine  language  for  all  the  segment’s  basic 
blocks  exists,  it  is  only  necessary  to  combine 
the  machine  language  for  each  block,  at  the 
same  time  retranslating  the  branches.  Notice 
that  no  code  is  generated  for  the  intra-segment 
JUWP’s  and  the  direct  branch  of  segment  9  now 
reflects  where  the  new  alternate  entry  point  is 
located. 

5)  Code  Motion  on  segment  8 

Finally,  segment  8  is  totally  optimized.  This 
means  performing  CSE  on  basic  blocks  8  and 
10,  then  code  motion  on  the  entire  segment. 
These  optimizations  have  no  effect.  When 
forming  the  machine  language  segment,  basic 
blocks  8  and  10  are  compiled  to  "fair"  code, 
but  since  the  machine  language  for  segment  9 
already  exists,  it  need  only  be  moved.  The 
branches  of  each  basic  block  are  again 
retranslarted  resulting  in  the  unconditional  BLE 
branch  of  block  10  being  made  direct. 

6)  Optimization  of  segment  7  (basic  blocks  7-11) 

The  optimization  for  this  segment  proceeds 
as  for  segment  8  in  a  straight  forward  manner. 
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FUSED  BLOCKS  1 

)  THRU 

9 

032242 

(MPV 

052376 

03734  |  ,  032516) 

053342 

MOVE 

04,  052376 

TRANSLATING  BLOCK  9 

053343 

1MULI 

04  .  000062 

037226 

(MPY 

,  03366  1  ,  037341  032530) 

032244 

(ADD 

,  032536 

C33C6  0  032537) 

053265 

MOVEI  04  CC0062 

053344 

ADD 

04  033660 

053 266 

IMUl  04.  033661 

032246 

(ADD 

.  012537 

,  032342  ,  032540) 

•32230 

(ADD 

032530  ,  033660  03253  1) 

032250 

(NOP 

ocecto 

COOCOO  .  OOOOCC) 

053267 

ADD  04,  033660 

032252 

(ADD 

,  032546 

.  052376  ,  032542) 

032232 

(ADO 

032531  032345  032532) 

053345 

MOVE 

05  032046 

053270 

ADDl  04  ,  045407 

053346 

ADD 

05  052376 

037234 

(MPV 

,  03366 1  .  032341  C32533) 

0J22* d 

(ADD 

,  032542 

,  032343  ,  032543) 

0532  71 

MOVE  1  05  000062 

032256 

(MPV 

,*032540 

•  032543  ,  032544) 

0532  72 

IMUl  05,  03336  1 

053347 

MOVE 

04  033577(04) 

037736 

(AOD 

,  032533  033660  ,  0325341 

053350 

IMUl 

04.  040503(05) 

053273 

ADO  05,  0336  6  0  j 

032260 

(ADD 

,*032550 

,  032544  #032550) 

032240 

(ADO 

032534  032345  017535) 

053351 

ADDO 

04*032550 

053274 

ADDl  05  04540/  j 

0322G2 

(NOP 

,  OOOOOO 

COOOCO  ,  OOCOOO) 

©3274? 

(MPV 

052376  032341  ,  032536) 

032264 

(ADD 

,  057376 

,  037340  052376) 

053275 

MOVE'  06,  C00062 

053352 

AOS 

05,  C52376 

053276 

IMUl  06.  052 J  76 

012266 

(OLE 

,  052376 

03234  1  ,  0000  f  1) 

032744 

(ADO 

032536  ,  033660  032537) 

053353 

CAMC 

C5  032341 

053277 

ADD  OG  031660 

053354 

JRST 

00  053342 

032246 

(AOO 

032537  ,  032342  032540) 

032270 

(JUMP 

,  OCOOI2 

OOCOOO  OOOOOO) 

053300 

ADDl  06  03 J 57  7 

053355 

MOVE  1 

15,  OOC012 

037250 

(MPV 

033661  037341  ,  03  7  54  1) 

053356 

POPJ 

17  OOOOOO 

C5330I 

MOVEI  07,  OOCOG? 

QUADS  COMPILED  •  14 

©53302 

IMUL  07  033661 

i 

032252 

(ADD 

,  037541  ,  052376  C3254?) 

TRANSLATING  BLOCK  6 

053303 

ADD  07  0523/6 

032212 

(MPV 

.  C33G6I 

03234  1  ,  032530) 

032254 

(ADD 

,  032542  ,  032343  032543) 

053357 

MOVEI 

04,  CC00G2 

053304 

ADDl  0  7  04  05  03 

053360 

IMUl 

04  .  0-366 1 

037256 

(MPV 

,•03254 ©  ,*037543  037544; 

032214 

(ADD 

,  032530 

,  C33650  ,  032531) 

053305 

MOVE  06,  OCOOOCXOO) 

053361 

ADD 

04  03366C 

053306 

IMUL  06  *000007 

032216 

(ADD 

032531 

.  032345  ,  032532) 

032266 

(ADD 

,■037535  ,  037544  ,  032545) 

053362 

ADDl 

04  .  045407 

053307 

MOVE  05  000000(05) 

032220 

(REPL 

,  032346 

,  OOOOOO  032532) 

053310 

ADD  05,  000006 

053363 

MOVE! 

05,  OOOOOO 

032262 

(REPL 

,  032545  ,  OOCOOC  *032532) 

053364 

MOVEM 

05.'  000004 

0533 1 1 

MOVEM  05  *000004 

032222 

(REPL 

,  032340 

,  OOOOOO  052376) 

032264 

(ADD 

,  052376  037340  057376) 

053365 

MOVEI 

05.  OOOOO  1 

0533  2 

MOVEI  07  00000 1 

053366 

MOVEM 

05,  052376 

053313 

ADD  0/,  0323  76 

032224 

(JUMP 

.  OCOOII 

,  OOOOOO  OOOOOO) 

053314 

MOVEM  07,  C 52375 

0533G7 

MOVE! 

15,  0000 1 1 

037266 

(BLE 

,  052376  C3734I  0000 1 1) 

053370 

POPJ 

17,  OOOOOO 

053315 

MOVEI  15,  ccco;i 

QUADS  TRANSLATED  - 

6 

053316 

CAM6  07  C37341 

•53317 

POPJ  1  7  000000 

|  TRANSLATING  BLOCK  10 

032220 

(JUMP 

,  000012  COOOOO  000000) 

032272 

(ADO 

.  033661 

032340  ,  033661) 

053320 

MOVEI  15,  COOO i 2 

053371 

MOVEI 

04,  000001 

053321 

POPJ  17,  OOCOOO 

053372 

ADD 

04,  013661 

QUAOS  TRANSLATED  • 

II 

053373 

MOVEM 

M  Cl 366 1 

032274 

(01  L 

,  03366 1 

,  032341  000010) 

COOt  MOTION  ON 

SEGMENT 

9 

05JJ74 

MOVEI 

15,  000010 

053375 

CAMC 

04  ,  037341 

CSC  ON  BLOCK 

9 

)533 76 

POPJ 

17,  OOOOCC 

QUAOS  ELIMINATED  • 

4 

C3227G 

(JUMP 

,  000013 

OOOOOO  OOOOOO) 

QUAOS  REMOVED  •  3 

053377 

MOVEI 

15,  C000I3 

053400 

POPJ 

17,  COOOCO 

COMPIING  BLOCK 

9 

1  QUADS  TRANSLATED  • 

3 

053322 

(MPV 

,  033651  ,  03234  032346) 

053332 

MOVE  Ofl.  0336GI 

FUSED  BLOCKS 

8  THRU 

10 

053333 

IMUl  1  C'i  CC00T.2 

053334 

MOVEM  04  017546 

MOVING  BLOCK 

8  TO  053401 

#53320 

(ADD 

,  032546  ,  033650  032547) 

032224 

(JUMP 

,  00001 1 

,  OOOOOO  OOOOOO) 

053335 

MOVE  04,  032546 

053336 

ADO  04  03366© 

MOVING  BLOCK 

9  TO  053411 

053337 

MOVEM  04,  032547 

032266 

(BLE 

,  052376 

,  032341  ,  00001 1) 

•53328 

(ADD 

,  032542  ,  032343  ,  032350} 

053432 

CAMG 

05,  032341 

053340 

ADDl  04  ,  04  54 ©7 

053433 

JRST 

00,  053421 

053341 

MOVEM  04  037550 

032770 

(JUMP 

,  000012 

OOOOOO  ,  OOOOOC) 

053330 

(JUMP 

0000 11  ,  C37276  ,  032266) 

032228 

(NOP 

,  000000  ,  OOOCCO  OOCOOO) 

MOVING  BlOCK 

10  TO  053434 

•32238 

(NOP 

,  ©ooooo  oooooo  ,  oeooco) 

032274 

(OLE 

,  033661 

,  03234  1  ,  0000 10) 

032232 

(NOP 

OOCCOO  000000  000000) 

053437 

MOV  , 

15,  000010 

032234 

(NOP 

,  oCOOOO  ,  000000  000000) 

053440 

camg 

04  032341 

832238 

(NOP 

,  OOOOOO  000000  000000) 

053441 

POPJ 

17,  OOOOOO 

032240 

(NOP 

000000  OOCOOO  ©ooooo) 

032276 

(JUMP 

000013 

OOOOOO  OOOOOO' 
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o^saa?  Movti  15  ooooia 
0530  <13  POPJ  17  000000 

CODE  MO)  UN  ON  SEGMENT  I 

CSE  ON  BLOCK  g 

QUADS  ELIMINATED  •  0 

CSE  ON  BLOCK  10 

QUADS  ELIMINATED  •  0 

NO  QUADS  REMOVED 


COMPILING  BLOCK 

03??  12 

• 

(MPV 

033G6I 

03234  1  .  037  53  0) 

053644 

MOVE 

04  C33661 

053446 

IMULI 

04  ,  000X2 

032?  14 

(ADO 

,  03?530 

033660  032531) 

053446 

hTO 

04  033660 

03??  16 

(ADD 

03*531 

032345  C32532) 

037770 

(«EPl 

03?  J4  r> 

OOOOOO  .•  03753?) 

Of  344? 

SHIM 

00  04560/(04) 

03???? 

(ftfPl 

037340 

OOOOOO  ,  0573  76) 

053450 

MOVt  1 

04,  000001 

053451 

MO  VIM 

04  057376 

037778  (JUMP  COOOll  000000  C 00000) 
QUADS  COMPILEO  •  6 


moving  BLOCK  9  TO  ossas? 

037766  (BLE  057376  ,  037341  00001 I) 

0534  73  CAMG  05  037341 

053474  JRS)  CO  0534G? 

037770  (JUMP  0000!?  ,  000000  ,  000100) 

COMPILINC  BLOCK  10 

1)3777?  (ADO  ,  033661  ,  037340  ,  033661) 
053475  AOS  00,  C336G I 

037774  (BLE  0336G I  ,  037341  ,  000010) 

0534)6  CAMG  00  037341 

053477  JUST  00  C53440 

0377/G  (JUMP  ,  G000I3  .  000*00  OCOOOO) 
053500  MOVEI  15,  000013 

053301  PDPJ  17  000000 

QUADS  COMPILED  .  3 

TRANSLATING  BLOCK  7 


0377W3 

(RCPl 

,  037340 

OCOOOO  033661) 

053502 

MftVPI 

04.  000001 

0535^ 

MOVEM 

04  .  033661 

03??  10 

(JUMP 

,  OCOOIO 

OOOOOO  OOOOOO) 

053504 

MOVEI 

15,  000010 

053505 

POPJ 

17  OOOOOO 

QUAOS  TRANSLATED  . 

? 

TRANSLATING  BLOCK  II 

637300 

(AOD 

.  033660 

.  037340  033660) 

053506 

MOVEI 

04  00000! 

053507 

ADO 

04,  033GG0 

053510 

VOVtM 

04  ,  033660 

03730? 

(BLE 

.  033GG0 

037341  ,  000007) 

05351 1 

MOVEI 

15.  000007 

05351? 

CAMC. 

C4,  03734  1 

053513 

POPJ 

17  COCCOO 

032304 

(JUMP 

,  000014 

.  OOOOOO  ,  OOOOOO) 

053514 

MOVEI 

15.  000014 

053515 

POPJ 

17  OOOOOO 

QUADS  TRANSLATED  .  ? 


FUSEO  BLOCKS  7  THRU  II 

MOVING  BLOCK  7  10  053516 

037710  (JUMP  000010  600000  000000) 

MOVING  BLOCK  g  TO  053570 

037774  (JUMP  0000 1 1  COOOOO  000000) 

MOVING  fliOCK  9  TO  093576 

037766  (BLE  ,657378  037341,000011) 


053547  CAMG  05  037341 

053550  JRS)  00  053536 

037776  (JUMP  ,  600017  000006  ,  600000) 

MOVING  BLOCK  10  TO  053551 

037774  (BLE  ,  033661  ,  037341  000010) 

053557  CAMG  04,  037341 

053553  JUST  00,  053570 

037776  (JUMP  ,  000013  000000  ,  000000) 

MOVING  BLOCK  ||  TO  053554 

037307  (BLE  ,  033G60  037341  006007) 

053557  MOVEI  15,  000007 
053  SCO  CAMG  04  037341 

0535b!  POPJ  17  000600 

037304  (JUMP  .  0000 14  ,  000000  600000) 

053567  MOVEI  15.  000014 

053563  POPJ  17,  000000 

CODE  MOTION  ON  SEGMENT  7 

CSE  ON  BLOCK  7 

QUAOS  ELIMINATED  •  0 

CSE  ON  BLOCK  I  I 

QUADS  ELIMINATED  •  D 
NO  QUADS  REMOVEO 

COMPILING  BLOCK  7 

03770G  (RE PL  ,  037340  ,  000000  033661) 

053564  MOVEI  04,  000601 
053565  MOVEM  04  ,  033661 

037710  (JUMP  ,  000010  ,  OOOOCO  000006) 

QUADS  COMPILED  .  7 

MOVING  BLOCK  6  ID  053566 

037774  (JUMP  ,  00001  I  OOOOOO  ,  000000) 


MOVING  BLOCK  9  TD  053574 


032266 

(BLE 

,  052376 

,  032341  ,  00001 1) 

0!  1015 

CAMG 

05,  03234* 

0bJ6 16 

JQST 

00.  053604 

03??;o 

(JUMP 

000012 

OOOOOO  ,  OOOOOO) 

MOVING  BLOCK 

10  TO  653817 

032274 

(BLE 

,  03366  I 

,  032341  ,  000010) 

053620 

CAMG 

04,  03734  1 

053G?  1 

JRST 

00,  053566 

032276 

(JUMP 

,  000013 

OOOOOO  OOOOOO) 

COMPILING  Bt OCK 

1  1 

032300 

(ADD 

.  033660 

037340  ,  033560) 

0536?? 

AOS 

04  .  033660 

03230? 

(BLE 

,  033660  . 

,  032341  ,  0000)7) 

053623 

CAMG 

04  032341 

053674 

JRST 

00.  053564 

037304 

(JUMP 

,  0C9014  , 

OOOOOO  ,  OOOOOO) 

053625 

MOVEI 

15.  000014 

05362* 

POPJ 

17,  OOOOOO 

QUADS  COMPILED  .  3 
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B.2  The  Linear  Equation  Solver:  LES 


1 

C 

MATRIX  iNVi«SON  US  AG  A  |QUATiON  SOLVES 

7 

C 

J 

c 

9E*S.  1)  A,GOR'*HM  Q73ICACM  !5.4(APRi  \V??)27A) 

a. 

c 

?)  »0«SYTHtGi.  AND  MOvESCJI  "COMPUTE*  SClUTiON 

5. 

c 

O'"  I'MAR  A^LUWAiC  Yt»T fcVS",  PMN’CEhAU 

6 

c 

fV.if  *-000  ct  J.  !'3C /. 

1 

c 

31  G&1  C0»v  R  1  AND  *ARStVi)t  >  COLLECTION  Of 

8 

c 

VA  «Lt5  fO  TEST^G  CCmputa’iONA.  ALGORITHMS 

9 

c 

wiuv  'AT{C  iCHNCt.  NF w  VO**  iy69 

10 

c 

1  1 

c 

SUOaOUHNES  USED  A«|  NOT  ThOOF  OVEN  IN  TMt  TEXTBOOK.  BUI 

17 

r 

TMf  R!  P|  AClMf  NTS  GIVEN  0V  MC  E«  'N  ALGOU'TMM  423 

13. 

c 

Q 

REAL  A(  CO  1 00)6(1  CO) 

iNTCGF  8  i  OCT 

1C 

REAOf  I2)h 

17 

2 

F  CRM  A  T(fjX  1?} 

18 

TVP|  3N 

19 

3 

f C»MAT(  N  .  i?J 

70 

N.i'M.  1 00 

2 

c 

UtVRATf  TES1  VAU.k  A  A(lJ).N  ABS(i-J) 

7? 

c 

SEE  (31.  EXAvr  E  if 

2  i 

c 

?a 

o 

o 

Z 

75 

00  1  J.'N 

7G. 

A(l  J 

77 

1 

A{JlM(IJ) 

78 

c 

7) 

MAN  PRCCiSAM 

30 

CAIL  OtCCM^NNQMAiP) 

J 

IF(»P(N)  hi  C)GO  TO  JO 

J? 

TYPE  CO 

33 

AO 

OWMAT(  MAI  H  X  SAC  A4  ) 

JA 

STOP 

35 

30 

00  10  J.’N 

36 

00  20  t-I.N 

J  7. 

70 

001.0  0 

38 

0(3). 10 

39 

C 

The  JTm  CAIL  PRODUCES  A  0  M£  jTh  COl  mN  or  THE  INVERSE 

4  0. 

tc 

CAa  UOiYE(NNDMAl).K') 

A  1. 

ENO 

c? 

c 

4  3 

GUC70UT  Nf  DirOMP(NNOlM  A.lP) 

4A 

BEAL  A(F,  MNOiMJ  T 

INTFGE8  IP(NO:M) 

«6 

IP(N).  1 

a; 

00  6  K.I7J 

c« 

IF (K  IQ  N)GO  to  5 

A9 

KP  |*K«  | 

CO 

51. 

00  1  i-KP  N 

97 

if(ARS(A(lJf))  GT.  A0G;A(MX)5;m.i 

53 

1 

CONTiNUf 

54. 

50. 

IF  (M  AF.  K)iP(N).-lP(N) 

56 

T.A(MX) 

5/ 

a(mx)-a(kx; 

58 

A(**).T 

59 

6(1  IQ  0>c,0  Tf;  'j 

60 

00  7  UfcP  1  fi 

6  1 

7 

A(iJC).  A<  X)/ 1 

0? 

00  a  J.KPtN 

63. 

T.A(M  j) 

6A 

A(M.J)-A(K.J) 

65 

A(xj;.* 

6G. 

•F  ( T  £ Q  0  )G0  TO  4 

67 

00  3  I3CPIN 

68 

3 

A(i,J).A(U).A(iX)«r 

69. 

A 

CCNTiNUE 

7  0 

5 

IF(A(VX)  IQ  OJIP(Tj)  -0 

7  I 

6 

CONTiNUf 

17. 

RETURN 

73 

ENO 

7A, 

C 

75. 

SUBROUTINE  SOL  VF.(NA.5M>\  flip) 

76 

REAL  A(NDWA3iM)fl(NC  M).T 

77 

•NTEGCR  IP(NOM) 

78 

»  N  £Q  1)00  TO  9 

79 

NMI-N- 1 

8  0. 

JO  7  K.lAMl 

K  1 

KPI-K.I 

h? 

M.IP(K) 

83 

1  .B(M) 

84. 

0(M).fl(K) 

85 

0(K).T 

BG. 

00  7  I.KPIA 

B  7 

7 

0( 0  B(').A(IX)*T 

88 

oo  8  xn.iMMi 

89 

KM  |  JJ  KB 

90. 

K.KMM 

91 

B(K).E3(K)/A(KX) 

9? 

T..B(K) 

93 

00  8  1.1  AMI 

94. 

8 

0(i)-B(l).A(lX)iT 

95. 

9 

B(  1)«B(  l)/A(l,l) 

96 

RETURN 

97. 

ENO 
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B.3  A  Prim#  Number  Generator:  SIEVE2 

1,  C  BET  CACM  I0SISKM..I967).  f>.  570 

2,  c  AlGOBlTHM  311'.  A8im£  NUM0E8  CtfNEflATOB  7 

3,  c  THE  AlGOBlTM  MAS  BEEN  MCWIEO  10  G£NC»AT£  1ME 

a  c  EAST  n  P8.MES  INSTEAD  Or  THE  BBIMCS  SM. 

5.  C 

6.  C  NO  IE:  THEBE  ABE  7BB9S  PBIMES  IESS  THAN  l»««6. 

7.  C 

*.  C  USING  M.I0..6  AS  AN  I.PPEB  BOUND  AN  UPBEB  BOUNO  Of  2*0 

9.  c  win  surricr  roa  the  abbavs  QDQSQB  ano  bb 

10  C  (IX.  77.SQBT(IO.«G)'IN(I0.6,.700) 


II 

INTEGER  Q(700)DQ(700)5Q(200)P(200)^R(700) 

1?. 

INTEGER  P(75COO) 

13 

integer  U.JJX***.JR0N 

14. 

10GICAE  T 

13. 

P(  l)*7 

16. 

0N.7 

17. 

P(7).3 

18. 

J.3 

19. 

JJ-3 

20. 

KO 

?  1 

fl(J).3 

22 

P(3)  5 

23 

Q(3)«25 

24 

DQ(3)>  10 

23 

SQ(3)*30 

26 

READ(I7)KK 

7f 

7 

F0RMAT(GX,I5) 

28 

TrPf  3XK 

29 

3 

fORMM(  «  .  ,15) 

30 

N -/  • 

31. 

to 

T..TRUF 

32. 

DN-6  ON 

33. 

DO  70  l*3JJ 

34. 

‘R.R(t) 

35 

ir(N  M.  Q('R))GO  TO  20 

36 

Q(iR).N*DQ(iH) 

37. 

DQ('R)*SQ(iR)-DQ(lfl) 

38 

T./AISE 

39. 

IF ( 1  M  JJ)G0  TO  70 

ao. 

JJO.M 

ai 

If (|R  hi.  J)Gi  TO  20 

42. 

J*J*I 

43 

R(J)-J 

44 

0(J)*r(J)»P(J) 

45. 

SQfJ)-GeP(J) 

46. 

OQ(J).SQ(J)«(  l*(P(J),3»  7*Q<J 

47. 

20 

CONTINUE 

48 

IF  (HOT.  T)GO  TO  80 

49. 

K.K.l 

50, 

P«).N 

51. 

IF (K  IQ.  KKJSTOP 

57. 

30 

|f ( J J  IQ,  3) GO  TO  80 

53 

JJ.JJ-I 

*/\ 

IF(Q(R(JJ))  .IT.  Q(R(JJ«  1)))G0  TO  3* 

b’J. 

C 

SIFT  SORT 

56 

RR(3)«R(3) 

57 

IFUJ  .IT.  4)G0  TO  90 

58. 

DO  110  1R.4JJ 

59, 

1.19  1 

60 

40 

lF(Q(R(ifl))  -GE  Q(HR(i)))GO  TO  110 

Cl. 

RR(M).RR(») 

62 

l.l-l 

63. 

IF(I  .GE.  3)G0  TO  *0 

64. 

1  10 

RR(l»  I)-R(IR) 

63. 

c 

MERGE  SORT 

66, 

90 

U3 

67. 

tfl.3 

68. 

JR-JJ*  1 

69. 

50 

IF(Q(RR(IR))  JG T  Q(R(JR)))GO  TO  170 

70. 

R(l).RR(iR) 

71. 

IR.IR.I 

72. 

IF(IR  ,GT.  JJ)GO  TO  70 

73 

GO  TO  130 

74, 

120 

RO)-R(JR) 

73. 

JR.JR*  1 

76. 

IF  (JR  XJT.  J)GO  10  60 

77 

130 

i-M 

78. 

GO  TO  50 

79 

60 

1-1*1 

80. 

R(I)*RR(W) 

81. 

IR.IR.I 

8? 

•F(lR  .It.  JJ)GO  TO  bO 

83 

70 

JJ.3 

84. 

80 

n.n*on 

85. 

GO  TO  10 

86 

END 
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B.4  A  Student  Electrical  Engineering  Problem:  EE 


1. 

RtAl  KJJ2 

7 

C  1 -9Gi  G 

3 

V6-I20 

a. 

Q.0800 

5, 

C2i » l, 

6 

C2U-100 

7 

C2I.I, 

H 

C?L*C2le  It  6 

3 

C7tJ-C?b»1t  6 

10. 

C2».C2  •  C  6 

1  1 

C3.I 

1? 

C3J-100. 

13 

«tAcn^)C3 

14 

PCRMAHGX/3  1) 

15 

TVPf  3  CJ 

IB 

3 

F0«MA1(  C3I  •  '}  3  1) 

17 

C3-C3i  It  6 

18 

CJU.C3U.  It  6 

19 

C3  -Cite  It -6 

20. 

C2-C21 

21. 

UMAX.O 

22. 

VM'XX-O 

2.1. 

C3'Jt  3  T.O 

2fl 

C?'itST.O 

25. 

00 

A.  (7.C2.Cl)/(2.9.Ci.C2) 

2B. 

U»LQRMO«C?*C?-C  i  *C  l)/(2efl»CI*C2) 

?; 

Q?0»V(3,'(  1  /C 1  I/C2-1/C3) 

28 

K2. 1  /(tM))#((A.B)«Q70  VO  'R.Q20/R*.  1 .  C I* I/C2)) 

29 

KI-Q20  <? 

JO 

OASt-  8  1  e(A«B)2{K?e( A  B)) 

31 

Q2TP.KI#<0ASC)ee((A.0;e(  1  /(0.O))).*2«0AStee((A.B)e(. |/(Q.B))) 

32 

V0-I2C.Q2IP/C2.Q20  C3 

33 

U-.5»(C  |•V0•yM.Q^TP.Q7TP/C2•Q2O•Q20/C3) 

34, 

»F(C7  .Ci T .  C2UG0  TO  30 

35 

’F(U  It.  UMAXJCO  TO  50 

30 

f(VO  ,IT  200)00  TO  50 

37 

VMAX.VO 

38 

UMAX.U 

33 

C30fST.C3 

40 

C2HCST.C2 

0  l. 

50 

C7.C2.C2i 

Q? 

00  to  40 

«3 

30 

C2.C2L 

|f(C3  ,CT,  C3UJG0  TO  100 

flj 

C3.C3.C3i 

C6 

CO  to  40 

a ; 

too 

C2flt3T-C20tSTeU6 

fl8 

C3itrST.C3U£ST«  1(6 

09 

TVPt  I.C20tSTX:30eST  VMAX^MAX 

50. 

1 

FC«MAT(GX  C2  1  IX  C3  ,9X,  V0  .8X  WATTS'./.CF  12.5) 

51 

tNO 

i 


B.5  A  Ganaralizad  Eiganvalua  Problam:  QZ 


Q7  ALGORITHM 

Rtf'.  MOLER.CH,  AND  SUWART.G W„  "AN  ALGORITHM  FOR 
THE  GINLRALI7LD  MATR.X  EIGENVALUE  MOSLEMS", 
SIAM  J.  NL'MER  ANAL.  9.6(DEC.  1977). 


DIMENSION  A(50.50)fl(50,50)X(50.50)AR(50). 

I  Ai(50)flT(50),IT(50) 

C  DIMENSION  RAM(SO) 

Rf  AD(  1 7000;N 

2000  F0RMAT(6X.I2) 

TVPE  200  IN 

2001  r^RMATC  N  •  .12) 

C  GEN.rvATE  TEST  DATA 

DO  20  Mil 
AR(I).I 
8T(I)«N’M 
DO  10  J-IN 
A(IJ)  .  AR(.) 

B(iJ)  -  BT(I) 

X(U)  -  O 
10  CONTINUE 

A(l.l)  .  ?  a  A(l,l) 

B(U)  -  2. a  0(1.1) 

20  CONTINUE 

CALL  Q7(5DNAfl.1  E  lARAlBT^T .TRUE.*) 

C  DO  3D  MJU 

C  «AM(I)  -  AR(i)/BT(l) 

C  PRINT  ICOlRAMO)A*(i)flT(l) 

C  00  6©  J-IN.5 

C  60  PR'NT  1 00 1 X(  J.OX  J-  1 ,l),X(  J*2.l)X(J*3.l),X(  J-6.1) 

C  1 00 1  FORMATC  \5EI2.6) 

C  30  CONTINUE 

STOP 
ENO 


SUBROUTINE  Q7(NDNAJBfPSALFRAlHflETAJTER. 

I  VANTXX) 

DIMENSION  A(ND,ND)JB(NDND)A;FR(N)ALEKN)BETA(N). 

I  X(N"»ND).i»'«(N) 

LOGICAL  WANTX 

CALL  QtHES(NDNABWANTX.X) 

CALL  Q7iT(NDNAi!£PSiPSA/PSB.ITER.WANTXX) 

CALL  Q7VAL(NU N AB/PSB/M  FRALFIflCTA.WANTX*) 
f(WANTX)  CAH  Q7VEC(NDNABfPSAEPSBALFRALFl( 
IBETAX) 

RETURN 

ENO 


SUBROUTINE  Q?HES(NONAD.WANTX#X) 
DIMENSION  A(N0M«3(N0ND)X(NDN0) 
LOGICAL  WAN1X 
f(NOT.WANTX)  GO  TO  10 
OD  3  I- IN 
DO  2  J. IN 
X(U)  -  0 

2  CONTINUE 

X(l.l)  .  I. 

3  CONTINUE 
10  NMI  -  N  I 

00  100  L-1NMI 
11  -  L*  l 
S  -  0. 

DO  70  I.IIN 

lf(A0S(B(I.L))  JGT.  S)  S  •  ABSWU)) 
20  CONTINUE 

tf (S  IQ.  0.)  GO  TO  100 
f(ABS(0(U))  GT.  S)  S  -  ABS(B(U)) 

R  -O 

DO  25  1-LN 
8(1, L)  .  8(I,L)/S 
R  ■  R  ♦  B(U)aa2 
25  CONTINUE 

R  -  SQRT(R) 

*(8(1,1)  .LT  OO  R  -  -R 
B(L.L)  -  B(L.L)  •  « 

RHO  -  RaB(U) 

DO  50  MIN 


78. 

T  .  0. 

79. 

DO  30  I-IN 

ID. 

T  -  T  .  B(U)*B(U) 

81. 

30 

CONTINUE 

82. 

T  -  -T/tHO 

83 

DO  60  I-IN 

86. 

B(U)  -  B(U)  *  TaB(I.L) 

85. 

60 

CONTINUE 

SB. 

50 

CONTINUE 

87. 

DO  80  J- IN 

88. 

T  .  0. 

89. 

DO  60  l-LN 

90. 

T  -  T  .  B(l,L)aA(U) 

91. 

60 

CONTINUE 

92. 

T  -  -T/RHO 

93. 

DO  70  l-LN 

96 

A(IJ)  -  A(U)  .  TaB(I.L) 

95. 

70 

CONTINUE 

96. 

SO 

CONTINUE 

97. 

0(1,1)  -  -SaR 

98. 

DO  90  ULIN 

99. 

B(I,L )  -0. 

IDO. 

90 

CONTINUE 

101. 

IDO  CONTINUE 

102. 

F(N  .It.  2)  00  TO  170 

103. 

Nb.2  .  N-2 

106. 

00  160  K.IMM2 

105 

K  1  .  K.| 

106. 

Mill  -N-K-l 

107. 

DO  ISO  IB- INK  1 

108. 

L  .  N-LB 

109. 

LI  ■  M 

1  ID. 

CALL  HSH2(a(  lx)  All  1  «XM  XJ2.V  1  ,V2> 

III. 

F(UI  N€  10  GO  TO  125 

1  17. 

00  1  10  J-KN 

113. 

T  •  A(IJ)  .  U2.A(IIJ> 

1  16. 

A(U)  •  A(U)  .  T.VI 

115. 

A(IIJ)  .  A(IIJ)  .  T.V2 

116. 

1 10 

CONTINUE 

117. 

AUIX)  ■  D 

118. 

DO  120  J.LN 

119. 

T  •  BIU)  •  U2.BUIJ) 

I?0. 

B(U)  .  BUJ)  •  T.VI 

171. 

IKLUI  ■  B(LIJ)  .  T.V2 

177. 

170 

CONTINUE 

173. 

125 

CAU  HSH2IUI U  1)3(1  UIUUJ2.VI.V2) 

176. 

IFOJI  M.  U  GO  TO  ISO 

125. 

DO  130  bUt 

126. 

T  .  BOX  1)  .  U2.B0X) 

127. 

0(1.11)  •  BOXI)  >  T.VI 

178 

8(1,1)  •  BOX)  .  T.V2 

129. 

130 

CONTINUE 

1  VO. 

BUM)  ■  0. 

1 3  1 . 

DO  100  l-IN 

137. 

T  -  A(l,l  1)  •  U2.A0X) 

133. 

A(IXI)  -  A(I.LI)  .  T.VI 

136. 

A(l,l)  ■  A(IX)  •  T.V2 

135. 

160 

CONTINUE 

136. 

F(NOT.  WANTX)  GO  TO  ISO 

137. 

DO  165  l-IN 

138. 

T  .  X(IX  0  •  U2.X0X) 

139. 

XO.ll)  •  X(IXI)  .  T.VI 

160. 

X(l.l)  .  xox)  .  T.V2 

I6|. 

165 

CONTINUE 

167. 

150 

CONTINUE 

163, 

160  CONTINUE 

166, 

170  CONTINUE 

165. 

RETURN 

166. 

ENO 

167. 

c 

168. 

SUBROUTINE  Q?IT(KDNAfl£P$£PSA£PSB,ITER.WANTXlX) 

169. 

DIMENSION  A(NOND)B(NONO)X(NONO) 

150. 

DIMENSION  ITER(N) 

151. 

LOGICAL  WANTX*© 

152. 

ANORM  -  0. 

153. 

BNORM  .  0. 

156. 

DD  185  l-IN 

I 


i 


1 

Wr_ 


m 


155. 

I  56. 
157. 

158 

159 
160. 
161. 
IG2. 
IG3 
I6d. 
IG5 
IG6. 
167. 
IG8. 
IG9. 
170. 
171 

172. 

173. 
17  a. 
175. 
I7G. 

177. 

178. 

179. 

180. 
181. 
182 
183. 
isa. 
185. 
I8G. 
187 
188. 
189 

190. 

191. 

192. 

193. 
19a. 
155 
196. 
197 
198. 

199 

200 
201. 
202. 
203 
20a. 
205. 
20G 
207. 
2C8. 
209 
210. 
211 
212. 
213. 
2ia. 

215. 

216. 

217. 

218. 
219 
220. 
22  I. 
222 

223. 

224. 

225. 

226. 

227. 

228. 

229. 

230. 

231. 


ITER(I)  .  0 
ami  .  a 

If  (I  ME.  I)  AN'  -  ABS{A(I.|- 1)) 

6M  -  0. 

DO  180  J-IJU 

AM  >  AM  .  ABStA(lj)) 

ONI  •  ONI  .  ABSO(Uj) 

180  CON f  INU£ 

f  (AN!  GT  ANOflM)  ANORM  .  ANl 
If  (ONI  .GT.  BNORM)  BNORM  .  BN  I 
185  CONTINUE 

EPS A  .  EPS*ANORM 
EPSQ  .  EPS.BNORM 
M  .  N 

200  if (M  .IE  2)  GO  TO  390 
00  220  LB»  I  ,M 
l  -MIB.I 

lf(l  IQ.  I)  GO  TO  2G0 
•f (A0S(A(l ,1- 1))  .IE.  EPSA)  GO  TO  230 
220  CONTINUE 
230  A(L.l  t)  •  0. 

lf(l  .IT,  M-|)  GO  TO  260 
M  .  l-l 
GO  TO  200 

260  lF(ABS(0(U))  .GT.  EPS0)  GO  TO  3C0 
0(1 .1)  •  0. 

II  -  M 

CAU  HSH2(A(t.L)A(L  I.DAJIAJ2.VI.V2) 

•F(UI  ME  I.)  GO  TO  280 
DO  270  MM 

T  -  A(lj)  .  U2« A(i  1  j) 

A(LJ)  .  A(U)  .  T.VI 
A(l  I  ,j)  .  A(l  |  J)  .  T  iV2 
T  -  D(U)  .  U2.B(UJ) 

B(U)  .  B(iJ)  «T.V  | 

B(l  U)  .  B(LIJ)  .  T.V2 
270  CONTINUE 
280  l  .  LI 

GO  TO  230 
300  M|  .Ml 
LI  •  M 
CONST  .  0.75 
ITEB(M)  .  ITEB(M'  .  | 

*f (ITER(M)  IQ  |)  GO  TO  305 
if(AnS(A(M*A  I))  .IT,  CCNST.OiDI)  GO  TO  305 
If  (ABS(A(M- 1  M-2))  .IT  C0NST.GLD2)  GO  TO  305 
If (ITER(M)  £Q.  |0)  GO  TO  310 
lf(lTCR(M)  .GT,  30)  GO  TO  380 
305  B  I  I  •  B(t.l) 

022  •  B(LUI) 

lf(ABS(0?2)  .IT.  EP$0)  B22  ■  EPSB 
031  .  B(MIMI) 

IT(A0S(033)  .LT.  tPSB)  B33  •  EPSB 
Ba4  .  0(MV) 

if(AB5(0aa)  .lt,  epsb)  aaa  .  epsb 

All  .  A(L  1 1/0  I  | 

A  12  -  A(UI)/a?? 

A2I  -  A(l  |,L)./B  1 1 
A22  .  AUUD/B22 
A33  .  A(MIMI)'B33 
A34  •  A(M|M0/B44 
A43  •  A(MMI)/933 
Aaa  •  A(MMO/Baa 
0  12  •  0(1,1  0/B22 
634  -  0(M|M)/8OO 

A  10  •  ((A33-A I  |).(A04-A  1 1)  .  A3 4. AO 3  .  A43.03O. A  I  |)/A2  I 
I  •  AI2  -  A  I  |*BI2 

A20  -  (A22-A I  I-A2  l *0  12)  -  (A33-AII)  •  (A44-A 1 1)  •  A43.B34 
A30  -  A(L*2I  l)/B22 
GO  TO  315 
310  AIO  .  0. 

A20  •  0. 

A30  •  1.1605 
318  OLD  I  •  A0S(A(MM  D) 

0102  •  A0S(A(M- 1  Jd-2)) 
r(MOT.WANTX)  10R  I  •  L 
if(WANTX)  LORI  •  I 


232. 

231. 

234. 

235. 

236. 

237. 
238 
239. 
200. 
20|. 
202 

203. 

204. 

205. 
20G. 

207. 

208. 
209. 
250. 
251 

;  .2 

253 

254. 

255. 

256. 

257. 

258. 

259. 

260. 
261 
262. 
263. 
260. 
265. 
26  6. 
267 
2G8. 
2G9. 

270. 

271. 
2/2 
273. 
270. 
275. 
2  76. 

277. 

278. 

279. 

280. 
281. 
282. 
283. 
280 

285. 

286. 

287. 

288. 

289. 

290. 

291. 

292. 

293. 

294. 

295. 

296. 

297. 

298. 

299. 

300. 

301. 

302. 

303. 

304. 

305. 
306 
307, 

306. 


If  (MOT. WANT  X)  MORN  -  M 
if(WANTX)  MORN  -  N 
DO  360  K-IMI 
MiO  •  KNEW) 

K|  -  K*l 
K?  -  K*2 
K3  »  K.3 

F(K3  .GT,  M)  K3  «M 
KM  I  .  K-l 

If  (KM  I  .LT.  1)  KM  |  .1 

If (K  IQ.  1)  CALL  HSI*3(A  IOA20A30U I  JJ2JU3.V  I.V2.V3) 
lf(K.GT.l  AND.  K.LTWl) 

I  CALL  HSH3(A(KKM l)A(K IXM |)A(K2KM |)JU IJU2U3.V I.V2.V3) 
if (K  IQ.  Ml)  CALI  HSH2(A(KKM|)A(KU(M|)jljU;2.VI,V2) 
lf(UI  ME.  I.)  GO  TO  325 
DO  320  j-KMIMORN 
T  .  A(KJ)  .  U?«A(KU) 

If  (MID)  T  -  T  .  U3«A(K2«J) 

A(K J)  .  A(KJ)  •  T.VI 
A|KU)  -  A(K  I  J)  .  T.V? 

If  (MID)  A(K2J)  -  A(K?,J)  .  T.V3 
T  -  B(KJ)  .  U2.D(K|J) 

If  (MIO)  T  .  T  .  U3iB(K2J) 

B(KJ)  .  B(KJ)  .  T.VI 
B(K|j:  -  B(K|  :)  .  T.V2 
If  (MID)  B(K2J)  •  B(K2<J)  •  T.V3 
320  CONTINUE 

If (K  IQ.  1)  GO  TO  325 
A(KIK-I)  ♦  0. 
if  (MIO)  A(K2K-I)  •  6. 

325  If (K  IQ.  Ml)  GO  TO  300 

CALL  HSH3(B(K2K2)B(K2XI)H(K2X)jU|jU2JU3.V|,V2,V3) 

IF(UI  ME.  I.)  GO  TO  340 
DO  330  UL0RIX3 

T  .  A(IX2)  •  U2.A(IXI)  ♦  U3.A(IX) 

A(IX2)  •  A(IX2)  «  T.VI 
A(IXI)  -  A(IXI)  •  T.V2 
A(IX)  •  A(IX)  •  T.V3 
T  -  0(1X2)  «  U2.0OXI)  ♦  U3.B(IX) 

9(1X2)  •  0(1X2)  •  T.VI 
J(IXI)  •  0(1X1)  •  T.V? 

BCX)  •  BOX)  •  T.V3 
330  CONTINUE 

B(K?X)  •  0. 

0(K2XI)  •  0. 

If (MOT.WANTX)  GO  TO  300 
00  335  MM 

T  .  X(IX2)  «  U2.X(IXI)  ♦  U3.X(IX) 

X(IX2)  -  X(IX2)  ♦  T.VI 
X(IXI)  ■  X(IXI)  •  T.V2 
X(IX)  •  X(ix)  •  T.V3 
335  CONTINUE 

340  CALI  HSM2(B(KIXI)5(KIX)jUljU2.VI.V2) 
if (U 1  ME.  I.)  GO  TO  360 
00  350  MOR 1X3 

T  .  A(IXI)  ♦  U2.A(IX) 

A(IXI)  -  A(IXI)  ♦  T.VI 
A(IX)  •  A(IX)  ♦  T.V2 
T  -  B(IXI)  •  U2.B(IX) 

6(1X1)  •  0(1X1)  «  T.VI 
B(1X)  •  BOX)  •  T.V? 

350  CONTINUE 

B(KIX)  •  0. 

if  (MOT.WANTX)  GO  TO  360 
DO  355  MM 

T  •  X(IXI)  ♦  U2.X(IX) 

X(IXI)  -  X(IXI)  ♦  T.VI 
X(IX)  •  X(IX)  ♦  T.V2 
355  CONTINUE 
360  CONTINUE 
GO  TO  200 
380  00  385  UlM 
1TCR(I)  -  -I 
385  CONTINUE 
390  CONTINUE 
RETURN 
ENO 
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309. 

C 

310. 

SUOHOUTINE  QZVAL(NONAB£PSBAlFfiAlFIBCTA.WANTX<X) 

3  1  1. 

OiMfNSiON  A(NONO)B(NOXO)ALFR(N)AIFKN)BUA(N)<X(NONO) 

312. 

LOGICAL  WANTX/l* 

3  13. 

M  .  N 

3  14. 

400 

CONTINUE 

315. 

AM  £Q  |)  GO  TO  410 

316. 

F(A(MM-|)  NE.  04  GO  TO  420 

317 

410 

ALFfl(M)  .  A(M,M) 

3  18. 

BE  TA(M)  -  B(MM) 

319. 

ale  KM)  .  0. 

320 

M  •  M-t 

3?  1 

60  TO  490 

322 

420 

l  -  M-l 

323 

AABS(0(lA))  GT  EPSB)  60  TO  425 

324 

0(U)  •  0. 

325 

call  MSH2(A(L.I)A(M.I)JU  IJJ2.VI.V2) 

326 

GO  TO  460 

327 

425 

AAOS(B<MM))  GT.  EPSB)  GO  TO  430 

328. 

0(MM)  .  0. 

329. 

CALL  HSH2(A(MM)A(M.I)JUIJJ2.VI.V2) 

33& 

BN  •  0. 

331. 

60  TO  435 

332 

430 

AN  •  A0S(A(l,l))  .  ABS(A(l AO)  .  A0S(A(M.l))  .  ABS(A(MM)) 

333. 

BN  .  ABS(B(l.L))  .  ABS(BUAO)  •  ABS(B(MM)) 

334 

All  .  A(U)/AN 

335. 

A  12  •  A(LM)/AN 

336. 

A2I  •  A(M.l)/AN 

337. 

A22  •  A(M,M)/AN 

338. 

B  1  1  -  B(l.l)/BN 

339 

B  12  •  BUM) /BN 

340. 

B22  •  B(M>0/BN 

34  1 

C  •  (A  1 1  ■822  •  A??*0  |  |  .  A2 1*0 12)/?. 

342 

0  •  (A22.0 1 1  -  A 1 1.022  •  A21*Bl2).*2/4. 

343 

1 

1  .  A2I«022>(AI2,0||  -  Al  1.012) 

344. 

AO  -IT.  0)  60  TO  480 

345. 

»F(C  6E.  03  E  -  (C  .  SQRT(O))  (Bl  I.B22) 

346. 

lf(C  .IT  0.)  E  .  (C  -  SQflT (0))/(B  1 1  aB22) 

347 

All  .A||  .  {.011 

348 

A  17  -  A 12  -  E.0I2 

349 

A22  •  A22  E.B22 

350. 

FHP  .  (ABS(A|  |).A0S(AI2))  GE.  (ABS(A2 l)*ABS(A22)) 

351. 

IF  (FLIP)  CALL  HSH2(A|2^I  IJU4JU2.V1.V2I 

352. 

IF  (.NOT /UP)  CAll  H$H2(A22.A2IJJUJ2.V|,V2) 

353 

435 

IF(UI  NE.  14  GO  TO  450 

354. 

00  440  l-IM 

355. 

T  -  A(|*0  .  U2« A(t.l) 

356. 

A(IM)  -  A(IAI)  .Vl.T 

357. 

ACM)  •  ACU)  .  V2.T 

350 

T  -  0(IM)  .  U2«B(I.L) 

359. 

O(IM)  -  BOM)  *  V  1  *T 

360. 

0(1,1)  -  0(1,1)  .  V2.T 

361. 

440 

CONTINUE 

362. 

IF  (NO  T. WANT  X)  GO  TO  450 

363 

00  445  l-IN 

364. 

T  •  X(IM)  •  U2.X(l.l) 

365. 

X(IM)  •  X(IM)  •  Vl.T 

366. 

X(U)  •  X(M)  .  V2*T 

367. 

445 

CONTINUE 

368. 

450 

IF  (BN  £Q.  0.)  60  TO  475 

369. 

FLIP  .  AN  GE.  AQS(E)aBN 

370. 

IF(FUP)  CALL  HSH2(B(L,l)B(M.L)AJ  1  JJ2.V  I.V2) 

371. 

* (NOT n IP)  CAll  HSM2(A(l.l)A(M4.)jj|iJ2,V|.V2) 

372. 

460 

IF (U 1  NE.  14  GO  TO  475 

373. 

00  470  J.LN 

374. 

T  .  A(U)  4  U2iA(M/J) 

375 

A(U)  .  A(U)  .  Vl.T 

376. 

A(MJ)  .  A(M4)  •  V2iT 

377. 

T  .  0(1  J)  *  U2*B(MJ) 

378. 

0(14)  •  0(14)  •  Vl.T 

379. 

B(MJ)  .  0(M4)  •  V2.T 

380. 

470 

CONTINUE 

381. 

475 

a(m.l)  •  a 

302. 

b(m.i)  .  a 

383. 

AlFR(l)  .  A(l.l) 

384. 

AIFR(M)  .  A(MA*) 

385. 

BETA(L)  .  b(l,l) 

386. 

BETA(M)  -  B(MM) 

387 

AIFI(M)  .  a 

388. 

AIFKl)  •  6. 

389. 

M  •  M-2 

390. 

GO  TO  490 

391. 

480 

ER  •  C/(B  1  1  »02?) 

392. 

El  -  SQRT(-O)/(01 1.022) 

393. 

A  J !5  •  All  -  ER.BI  1 

394. 

AMI  .  EI.B  1 1 

395. 

AI2R  •  A 12  -  ER.0I2 

396. 

A 121  •  EI.B  12 

397. 

A2IR  •  A2I 

398. 

A2I1  •  a 

399. 

A22R  •  A22-  ER.022 

400. 

A22I  •  EI.B2? 

401. 

FLIP  .  (ABS(A|  |R)«ABS(A|  II)»ABS(A|2R)«ABS(AI2I))  GE. 

402. 

1 

(A0S(A2  IR)*ABS(A22R).ABS(A22I)) 

403. 

IF  (FLIP)  CALL  CHSH2(AI2RAI2I.-AI  IR.-AI  1 IJCZJSZRJSZI) 

404. 

AXOT/UP)  CALL  CM8H2( A22RA22I  A2 IR.-A2 1  l£Z $NSl  1) 

405. 

FLIP  .  AN  GE.  (ABS(E R) •  A6S(E  1)) •  BN 

406. 

IF  (FLIP)  CAll  CHSH2(C7.0I  USZR.0I2JSZUBI2, 

407. 

1 

SZR.B225ZUB22CQ5Q85QI) 

408. 

IF  (.NOT  / 1  IP)  CALI  CHSH2(CZ.AI  l*SZR.AI28ZI.A|2, 

409. 

1 

CZ.A2  l*SZR.A22^Zl«  A22GQ60R6QO 

4  10. 

SSR  •  SQR.SZR  «  SQt.SZI 

411. 

SSI  -  SQR.SZt  -  SQI.SZ8 

412. 

TR  •  CQ.CZ. Al  1  .  CQ.SZR.AI2  •  SQR.C7.A2I  •  SSR.A22 

4  13. 

Tl  •  CQ.SZI.AI2  -  SQl.CZ.A21  •  SSI.A22 

414 

BOR  •  CQ.CZ. B 1 1  «  CQ.SZR.BI2  •  SSR.B22 

4  15. 

001  •  CQ.SZI.B12  •  SSI.B22 

416. 

R  •  SQRT(0OR.0OR  •  001.001) 

4  17. 

BETA(l)  .  BN.fi 

418. 

AlFR(l)  .  AN«(TRtBOR  •  TI.BOIJ/R 

419. 

AIFKl)  •  AN.lTR.0OI  -  TI.BDRJ/R 

420. 

TR  .  SSR. A 1  1  .  SQR.CZ* A 12  •  CQ.SZR.A2 1  •  CQ.CZ.A22 

421. 

Tl  .  -SEI.AII  -  SQI.CZ.AI2  •  CQ.SZI.A2I 

427. 

BOR  •  SSR.BII  •  SQR.CZ.B  12  •  CQ.CZ. B22 

421. 

001  •  SSI.BII  -  SQI.CZ.BI2 

424. 

R  •  SORT  (BOR.  BOR  •  BOl.BOl) 

425. 

BETA(M)  .  BN.R 

426. 

..LFR(M)  •  AN*(TR.BDR  ♦  TI.BOIJ/R 

427. 

AIFKM)  .  AN.(TR.0DI  •  TI.90R)/R 

42B. 

M  •  M-2 

429. 

490 

IF(M  GT.  0)  GO  TO  400 

430. 

RETURN 

431. 

ENO 

432. 

C 

433. 

SUBROUTINE  QZVEC(NDXAB£PSA£PS0,AlfR,Alf  1JKTAX) 

434. 

OIMENSION  A(NONO)B(NONO)<AlFR(N)AlFKN)B€TA(N)X(NOXO) 

435. 

LOGICAL  FLIP 

436. 

M  •  N 

437. 

500 

CONTINUE 

438 

IF  (AIFKM)  NE.  0.)  GO  TO  550 

439. 

AIFM  .  ALFR(M) 

440. 

BUM  •  BETA(M) 

441. 

IF(ABS(AIFM)  .IT.  EPSA)  AIFM  .  0. 

442. 

IF(A0S(0ETM)  .IT.  EPSB)  BETM  .  0. 

443. 

B(MM)  .  1. 

444. 

1  •  M-l 

445. 

IF (L  £Q,  0)  GO  TO  540 

446. 

510 

CONTINUE 

447. 

LI  »  1*1 

448. 

SI  -  a 

449. 

DO  515  J-LIM 

65a 

SL  •  SL  •  (0ETM.A(U).ALFM*B(U)).B(JM) 

451. 

616 

CONTINUE 

452. 

W( L  £Q.  1)  GO  TO  520 

453. 

IF(A(l£.|)  NE.  03  GO  TO  536 

454. 

526 

0  •  BETM. A(l,l)  -  ALFM*B(U) 

455. 

F(0  £Q.  03  0  •  (EPSA*£PSB)/2. 

456. 

BUM)  •  -SI A) 

457. 

L  -  L-l 

458. 

GO  TO  640 

459. 

630 

K  .  L-l 

460. 

SK  .  6. 

461. 

DO  535  J-UM 

462. 

SK  .  SK  •  (BETM.A(KJ)-ALFM.B(KJ)).B(JM) 

154 


4G3. 

533 

CONTINUE 

4G4. 

UK  .  0E1M.A(KK)  AltM.B(KK) 

465. 

Ul  .  BtI«.A(K,l|  AlFM.BIK.) 

466. 

TU  .  BEtM.A(U) 

4G7. 

III  .  BE'M.A(U)  AlFM.B(Ll) 

468. 

D  ■  IKK. ni  IKt.UK 

4G9. 

1(D  XQ.  0)  0  ■  (EPSA.CPSB1/2 

4  70 

U(IA«)  ■  (tlK.SK  UK.Sl)/D 

47|. 

HIP  ■  ABS(UK)  G£,  ABS(IU) 

4  72 

ir(fiiP)  b(k m)  .  (SK  ■  !ki.b;iw))/ikk 

4  73. 

l(AOTXIIP)  B(KM)  .  (SI  .  Ill  .9(lM)/ltK 

474 

l  .  1-2 

475. 

540 

1(1  JGI.  0)  GO  10  510 

4  76. 

M  .  M  | 

4  77. 

GO  10  500 

4  78. 

550 

AIMS  .  AtH(U-l) 

4  79. 

AIMI  .  Alfl(M.  |) 

480 

BE I M  .  BEIA(M-I) 

48  1 

MS  .  M.| 

48?. 

Ml  .  M 

4  83 

8(M-|MR)  .  A  M,.0!MM)'IBE1M.A(MA»-1J) 

484 

B(M  1, Ml)  .  (U£!M.A(MM)-AlMS.BIMAA))/(BEIM./ 

485. 

fl(M.MS)  .  0. 

48G 

8(M Vi)  .  1 

487 

l  .  M2 

488 

1(1  XQ.  0)  GO  TO  515 

489. 

560 

CONII 'NIUE 

490. 

11  •  (.1 

491. 

SIS  •  0. 

492. 

su  .  e. 

4  n. 

DO  5G5  J.t  IM 

494. 

IS  ■  EIElM.A(lu)  •  A'MS.fl(i„) 

495. 

II  •  AiMl.B'lu) 

496. 

SlO  •  SIB  .  TS.B(J.MS)  .  t,.0(j,M) 

497 

il!  •  Sll  .  IS.B(JV')  •  TI.B(JMR) 

498. 

565 

CONUNUE 

499. 

1(i  XQ  1)  GO  10  570 

500 

1<A(l.ll)  AE  0.)  GO  10  575 

501. 

570 

OS  .  BEtM.A(U)  AiMS.B(U) 

5C2 

01  .  AlM..B(l.l) 

503 

504 

CAU  CDIV!  SIR.  SiiDS£llfl(lVS)jB(lJIAI)) 

1  ■  1  1 

605. 

l  ■  l  •  1 

GO  10  585 

50G 

575 

K  ■  l  1 

507. 

SK3  .  0. 

5C8 

SKI  .  0. 

509. 

DO  580  J-l  IM 

510. 

IS  -  BHM.A(KJ)  .  AIMS.B(KJ) 

511. 

Tl  .  AIM.B(KJ) 

512. 

SKS  .  SKS  .  IR.U(JVS)  -  Il.O(jyi) 

513. 

SKI  .  SKI  .  TPjB'.MI)  .  TI.BiJ/MS) 

514 

580 

CONIiNUE 

515. 

UKS  .  BEIM.A(KA)  ALMS.B(KA) 

516. 

IKK'  .  .ALM;»B(KX) 

517. 

ms  .  D;iM.A(K,t)  aims.b;ki) 

518. 

tkh  .  aim  .b(k,d 

519. 

I  IKS  .  BEIM.A(tX) 

520. 

IlK  .0, 

52  1. 

UlS  •  BC1M.A(  1)  A  MR.B(ll) 

522. 

UU  ■  ALM,. B(i,l) 

523. 

524. 
523. 

5?e. 

527. 

528. 
529 
530. 
53  I. 
532. 
533 
534. 
035. 
53G. 

537. 

538. 

539. 


I 


Ufl  -  TlCra^TliH  IKKUILU  ?KiR*MR 
Dl  •  TKXRtUt  .  TKlliUKH 

IMOR  IQ.  0  /.NO.  DI.CQO.)  CR  .  (E  PS  A. £  PSD)/?. 

CAH  COlVHlKR^S'U.UKi.  •SLR«Tk<!*3U> 

TlKR*5KI.TK^fl,SU-UiritSLR. 

D«£)iB(i  M3)B(LV0) 

HIP  •  (AOS(lKiCR).ABS(TKK]))  .ce.  ABS(UXQ) 

IHHIP)  CAU  CDIV(-SKR  TxtP.a(iVR).TKL.,BavO. 

•SKi-TXi«.Bav)  TK|>0(lM*) 

TKM.TXKIflJX^JJHK^O) 

V (SOI HP)  CAU  CDIV(.SlB-niR.Q(lK9J.Uli.flfi.Mij. 

•Sll.TUR.B{LVO-TLll*0(lMfl), 


t  •  l  2 

585  If (L  .GT,  0)  GO  10  5G0 

M  .  M-2 

596  If  (M  XjT.  0)  GO  TO  500 


UKR.Ulll3(K>«)B(KV0) 


54  0. 

M  .  N 

54  1. 

600  CONIINUE 

542 

DO  620  I.IA 

543 

s  .  0. 

644. 

00  610  J-IA* 

54  5 

S  •  S.  X(IJ).B(JM) 

546. 

610  CONTINUE 

54  7. 

X(IAI)  •  S 

648. 

620  CONTINUE 

549 

M  .  M  | 

350. 

1(M  .G1.  0)  GO  TO  600 

551 

M  .  N 

552. 

630  CONTINUE 

553. 

S  .  0. 

554 

KALE l(M)  A'E.  Go  TO  650 

635. 

00  635  1.  IN 

55G. 

R  •  ABS(X(IM)) 

637. 

1(8  .IT  S)  GO  TO  635 

568. 

s  ■  s 

539. 

0  •  xdAt) 

5G0 

635  CONTINUE 

5GI. 

DO  680  I.  IN 

5G2 

X(IM)  .  X(I7A)/D 

5G3. 

600  CONTINUE 

5G4 

M  .  M.  | 

5G5. 

GO  TO  690 

5GG 

650  DO  655  1.  IN 

5G7. 

8  •  X(IM.  1).  .2  .  x(IM)««? 

5GB 

1(R  IT.  S)  GO  TO  655 

5G9. 

S  .  S 

5/0. 

DR  •  X(IAM) 

571. 

Dl  •  X(IM) 

57?. 

655  CONTINUE 

5/3. 

00  660  1. 1 A 

5/4. 

CAI  i  CDIV(XGAl-  DX(IM)D«DIX(iA<-  l)X(IAD) 

575. 

G60  CONTINUE 

5/6. 

W  •  M-2 

577. 

690  If (M  .GT.  0)  GO  TO  630 

5  78. 

700  RLTURN  , 

579. 

END 

5B0.  C 

581. 

SUBROUTINE  HSH3(A  IA2A3XJIU2AJ3VI.V2.V3) 

582. 

lf(A2£Q0.  .AND.  A3.EQ.0J  GO  TO  JO 

383 

S  >  A0S(A  1)  .  ABS(A2)  .  ABS(A3) 

584. 

Ul  •  A  |/S 

585. 

U2  •  A2/S 

586. 

U3  .  A3/S 

58  7. 

R  •  SQ«T(UI*U|.U2»U2»U3*U3) 

588. 

IP(UI  .IT.  OJ  R  .  -R 

589. 

VI  .  -(Ul  *  R)/k 

590. 

V2  .  -U2/R 

591. 

V3  •  U3/R 

592 

Ul  -  1. 

393. 

U2  .  V2/VI 

594 

U3  .  V3/V| 

595. 

RETURN 

596. 

10  Ul  .  o. 

597 

RETURN 

598. 

END 

599.  C 

GOO. 

SUBROUTINE  HSH2(A|/2AJ|,U?.V|.V2) 

601. 

IF(A2  EQ.  00  GO  TO  10 

G02, 

S  .  ABS(AI)  .  A8S(A?) 

603. 

Ul  .  A  I/S 

604. 

U2  .  A2/S 

G05. 

R  -  SQRT(U  1  «U  1  «U2*U2) 

60G. 

IF(Ul  .IT.  OJ  R  .  -« 

607. 

VI  •  -(Ul  .  R)/R 

6C8. 

V2  •  -U2 /R 

609. 

Ul  -  1. 

6  10. 

U2  .  V2/VI 

611. 

RETURN 

G  12 

10  Ul  .  0. 

613. 

RETURN 

614. 

END 

615.  C 

616 

SUBROUT INE  CHSH?(A  IRA  1  A2SA2I.CSRSI) 
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6i7  F(A2RiQ.l  AND.  A2UQ.0J  GO  TO  10 

61*  f  (A IRIQ.Ol  ANO.  AllfQA)  00  TO  tO 

6  *T..  A  .  SQR1(AIB«AIR*A|t«AlO 

620  C  •  R 

62!  SB  •  (AIR»A2R«AIWA20A 

6?2.  >•  •  (A1R.A2I-A  U#A2R)/R 

62?  b  •  s4p(c.c.sr«sr.s**si) 

624  c  .  cm 

620.  3»  •  s«/l 

626.  3«  •  Sl/R 

627  ftUuM 

62 <$  iec  .  I. 

629  SB  •  0 

63C  SI  •  C. 

63  !  Vf  TUB* 

63?  20  C  .  a 

633  SB  •  I. 

634.  SI  •  A 

636.  B?  TURN 

636.  INO 

S3?  C 

638  SUBBOUT  HE  C0«V(XRXI.Yt.TIJB2») 

63*  MA8WVR)  IT.  AOS(Vl))  GO  TO  10 

f.flO.  wfl  •  XB/VR 

G4I.  Wl  -  Xl/Yt 

6«Z.  VI  •  Vl/Vt 

643  0  •  I.  •  VWVI 

644.  ZB  •  (WB  •  Wt*VI)/D 

GC5.  21  •  (Wl  •  WB«Vt)/D 

646.  BE  TURN 

64**  »8  WB  -  XB/VI 

r»48  wi  •  xi/vi 

64«5  VR  •  VR/VI 

050.  0  -  VR*VB  •  I. 

65  !.  ZB  •  (WR*VB  •  WO/D 

652.  Zl  •  (WWVt  -  WV)/0 

653  Of  TURN 

654.  ENO 
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Appendix  C 

Code  Matrices  for  Integer  '+' 


In  order  to  show  in  detail  the  form  of  the  analysis  used  by  the 
"fair"  code  machine  language  generator  for  the  arithmetic  and  logical 
operators  (see  Chapter  3,  Section  3. 1.3. 5),  this  appendix  contains  the 
code  matrices  for  integer  addition.  There  are  two  code  matrices;  one 
for  quads  of  the  formi  (+,V,E,V)  or  (+,E,V.V)  with  the  first  two  arguments 
commuted,  and  one  for  quads  of  the  fonti  (  +  ,EI,E2,T),  where  El,  E2,  and 
E  are  arguments  tha  i  may  be  simple  variables,  parameters,  results  or 
indirect  results;  V  is  a  simple  varibale,  parameter,  or  indirect  result; 
and  T  is  a  temporary  result.  Each  matr1  contains  16  cases,  depending 
on  the  mode  of  the  operands.  There  are  four  possible  operand  modes: 

MEM,  NUM,  REG,  and  REG+NUM.  Thus,  for  example,  the  case  "MEM/REG(s)" 
means  the  first  operand  is  in  memory  while  the  second  is  in  register  s. 


The  logic  of  a  case  analysis  is  presented  in  tabular  form  with 

the  following  conventions: 

I)  1 ne  machine  language  instructions  generated  are 
expressed  in  MACRO- '0  CPDP7laH,  the  assembly 
language  for  tne  PDP-10.  Curly  brackets  are  used 
for  the  conditional  generation  of  information. 

Thus,  for  examp  16, 

MOVE  r,{*}EI 

means  to  generate  a  MOVE  instruction  with  r  as 
the  register  field,  address  of  El  as  the  address 
field,  and  the  indirect  bit  set  if  tag  bit  I 
of  the  quad  is  set  (see  Appendix  A,  Section 

A. 3). 
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2)  The  information  that  cootrols  the  analysis  resides 
in  fields  in  the  temp  or  register  table,  or  tag 
bits  in  the  'n  1  quad.  The  mnemonics  for  the  tags, 
along  with  their  meaning,  can  be  found  in  Appendix 
A,  Section  A. 3.  The  mnemonics  and  their  meanings 
for  fields  in  fhe  register  and  temp  table  are: 

mnemonic  mean i nq 

RANGE  range  field  of  temp  entry 

NB  neg-bit  field  of  temp  entry 

INFO  information  field  of  temp  entry 

USES  use  field  of  register  entry 

To  reference  fields  in  the  tables,  an  indexing 
scheme  is  used  with  the  name  of  the  operand  being 
the  index.  For  example, 

NB[E2]  +  *+» 

means  set  the  neg-bit  in  the  temp  table  entry  for 
E2  (a  temporary)  to  plus. 


3)  The  following  variables  are  used: 


variable 

9 


s,t 

L 

C 


the  address  of  the  '+'  quad, 
an  unused  register.  This 
register  is  allocated  by  the 
register  allocation  algorithm. 
Initially,  the  register  has 
no  associated  temporary  or 
variable. 

registers  containing  the  operands, 
a  literal  constant,  folded 
or  otherwise. 

address  of  a  constant,  folded 
or  otherwise. 


4)  The  following  shorthand  notation  is  used  for  table 
head i ngs: 


symbol 

mean i ng 

# 

indirect  tag  bit  of  '  +  '  quad  set 

+ 

neg-bit  field  of  temp  plus 

— 

neg-bit  field  of  temp  negative 

temp 

temporary  tag  bit  of  ’+’  quad  set 

cons 

operand  is  a  constant 

1  it 

operand  is  a  1  ireral 
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As  code  Is  generated,  fields  in  the  temp  and  register  tables  must 
be  updated.  Only  those  updations  pertinent  to  the  clarification  of 
the  machine  language  generated  are  included  in  the  logic.  Some  updations 
are  given  explicitly,  while  others  are  indicated  by  enclosing  them  in 
double  quotes.  The  latter  updations  are: 


update 

"associate  Reg  with  T" 
or  "Regn-T" 

"associate  Reg  with  T+NUM" 
or  "Reg'vT+NUM" 


"associate  Num  with  T" 
or  "NUM'v.T" 


"negate  E" 


mean i ng 

update  the  correct  register  and 
temp  table  entry  to  reflect  that 
Reg  contains  T. 

update  the  correct  register  and 
temp  table  entry  to  reflect  that 
Reg  contains  T  which  has  mode  = 
"REG+NUM". 

Num  is  a  number  which  resides  in 
a  temp,  'tesociate  if  with  the 
result  temp  T  by  moving  in  the 
temp  table  the  number  indicator 
and  information  fields  of  the 
temp  conta i n i ng  Mum  to  the 
corresponding  fields  of  T. 

E  is  a  temp.  Perform  the  following 
modifications  to  its  entry  in  the 
temp  table: 

inrCe]  x-  -infoCeD 
nb[e] 


Subcases  for  a  case  are  numbered  using  the  Dewey  decimal  notation. 

For  example,  if  k  is  the  number  of  a  case,  k. I ,  ...  ,  k.m  are  its  subcases, 
k.I.I,  ...  ,  k.I.n  are  the  sub-subcases  of  subcase  <.!,  etc.  .  The  logic 
for  a  case  is  to  be  read  sequentially  with  subcases  being  disjoint  and 
code  not  appearing  under  any  subcase  being  comm, on  to  all  subcases.  Some 
cases  are  similar,  and  to  avoid  duplication,  one  case  is  transformed  into 
another  by  taking  certain  actions.  The  actions  are  enclosed  in  single 


quotes  and  are: 


action 


'reset  temp  mode' 


'El  +-*  E2 ' 
'same  as  k' 


meaning 

reset  the  mode  of  the  temp 
from  "REG"  to  "REG+NUM" 
with  the  number  set  to  zero. 

If  Q*RANGE[temp],  then  set 

the  USES  fieid  of  its  associated 

reg i ster  to  I . 

interchange  the  attributes 

of  the  two  operands. 

the  analysis  is  the  same  as 

for  case  k. 


Code  Matrix  for  (+,V,E,V)  or  (+E,V,V)  Commuted 


1 .  MEM/MEM 

MOVE  r, {*}E 
ADDB  r , {*  }V 

2.  MEM/NUM 


2. 1  E  a  constant: 

same  as  ( 1 ) 

2.2  E  a  1 i teral 

E=  1 

b/MO 

AOS  r,{*}V 

not  * 

# 

MOVE  1  r,E 

MOVE  r,E 

ADDB  r ,  {*}V 

23  E  a  temp  with  mode="NUM" 


folded  cons 

1  it 

same  as  ( 1 ) 

same  as  (2.2) 

3.  MEM/REG(s) 

3.1  E  an  i nd i rect  temp 

'reset  temp  mode' 
'same  as  (4)' 

3.2  E  a  temp 


0<RANGE[E] 

Q=RANGE[E] 

+ 

- 

+ 

ADDM  s, (*}V 

SUBM  s,{*}V 

ADDB  s,{*}V 

MOVNS  s 

MOVNS  {*} 

ADDB  s, {*}V 

I 


K 


3.3  E  not  a  temp 

no  temp  associated  with  s  I  temp  associated  with  s 


ADDM  s, 

GE[E]  and  USES[s>l  ) 


MOVE  r , s 


ADD  r,C  ADD  I  r,L  SUB  r,C  SUB  I  r,L 


-  MOVNS  s 

"negate  L" 

MOVE  r, L(s) 

ADDB  r , {*}V 


4.2  Q-r, ANGE[E]  and  USES[s>l 

_ temp _ 

cons  I  i  t 


ADD 

s,C  1 

1  ADDI 

+ 

ADDB 

s,{MV 

MOVNS 

ADDB 

_ *temp _ 

same  as  (4.1)  with 
s  replacing  r 


|  ADDB  s, {*}V  I 

5.  NUM/MEM  (impossible) 

6. NUM/NUM  (impossible) 

7.  NUM/REG(s)  (impossible) 

8.  NUM/REG(s)+fJUM  (impossible) 

9.  REG(s)/MEM 

9.1  V  an  indirect  temp 

'reset  temp  mode' 

'same  as  ( 13) ' 

9.2  V  in  a  register 

no  temp  associated  with  s  temp  associated  with  s 
ADD  s, {*}E  MOVE  r,{*}E 

MOV EM  s, {* }V  ADDB  r,{*}V 


10.  REG( s)/NUM 

10.  1 

V  an  indirect  temp 

'reset  temp  mode' 

' same  as  (14)' 

10.2 

No  temp  associated  with  s 

MOV  EM  s,  V 


AOS  s,V 


not  * _ 

ADD  I  s,L  ADD  s, 
MOV EM  s,L 


10.3  Temp  associated  with  s:  same  as  (2) 
I  I .  REG(t)/REG(s) 

I  I .  I  V  an  i  nd i rect  temp 

'reset  temp  mode' 

' same  as  (15)' 

11.2  V  in  a  register:  same  as  (3) 

12.  REG ( t ) /REG ( s ) +NUM 

12.1  V  an  i  nd i rect  temp 

'reset  temp  mode' 

'same  as  ( 16) ' 

12.2  Temp  associated  with  t:  same  as  (4) 

12.3  no  temp  associated  with  t 


ADD  t , C  I  ADD  I  t , L  |  SUB  t,C  I  SUB  I  t,L 
MOV EM  t.V 


MOV  NS 
"nega 

t,L 
t  t ,  V 
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For  cases  13-16,  V  is  a  temp  with  mode="REG+NUM" 
literal.  In  order  to  use  the  literal  as  an  index,  it 
Before  code  for  the  case  is  generated,  the  neg-bit  is 


MOV NS  V 
"negate  L" 


13.  REG ( s  )  +NIJM/MEM 

MOVE  r,{*}E 
ADDM  rfL(s) 

14.  REG(s)+NUM/NUM 

14.1  E  a  constant:  same  as  (13) 

14.2  E  a  literal 


E=  1 

1  E/MO 

AOS  L(s ) 

not  * 

# 

MOVE  1  r,E 

MOVE  r,E 

ADDM 

r , L( s) 

14.3  E  a  temp  with  mode="NUM" 

folded  cons  lit _ 

same  as  (13)  same  as  (14.2) 


15.  REG ( t ) +NUM/REG ( s ) 

15.1  E  an  i nd i rect  temp 

'reset  mode  of  temp' 
'same  as  ( 16) ' 

15.2  E  not  a  temp 
ADDM  s,L(t) 

15.3  E  a  temp 

_ + _ 

ADDM  s, L(t) 


where  NUM  i s  a 
must  be  positive, 
checked . 


SUBM  s, L(t ) 
MOVNS  L(t) 


lb.  REG ( t ) +NUM/REG ( s ) +NUM 

16.1  Q=RANGE[El  and  USES[s]=  I 


*+emp 

cons 

1  it 

+ 

- 

ADD  s,C 

+ 

ADDI  s, L 

— 

MOVNS  s 
"negate  L" 

MOVNS  s 

MOVE  s,L(s) 

ADDM  s,L(t) 

16.2  Q<RANGE[E]  or  (Q=RANGE[E]  and  USES[s>l) 


temp  ; 

'  *teinp 

+ 

- 

+ 

- 

MOVE 

r  *s 

MOVN 

r.s 

_ 

MOVNS  . 

cons 

1 _ L1+ 

cons 

1  it 

"negate  L 

ADD  r,C 

> 

o 

o 

”1 

1 — 

SUB  r,C 

SUB  1  r,l 

MOVE 

r, L(s) 

ADDM  r,L(+) 

C.2  Code  Matrix  for  <+,EI ,E2,T) 

1 .  MEM/MEM 

MOVE  r , {*  }E I 
ADD  r ,  {*  }E2 

2.  MEM/NUM 


E2  not  * 

E2  * 

MOVE  r,{#}EI 
"associate  Num  with  T" 

same  as  ( 1  ) 

3.  MEM/REG(s) 


64 


) 


3.2  E2  a  temp 


Q<RA>!GECE2l 

£ 

- 

GE[E2] 

MOVE  r , s 

+ 

- i 

+ 

- 

_  ADD  s,{*}EI 

SUB  s,{*}EI 

ADD  r,{*}EI  SUB  r,{*}EI 

NB[T]  <• 

E2  not  a  temp 

no  temp  associated  with  E2 

■  NB[E2] 

temp  associated 

with  E2 

ADD  s, {*}E 1 

MOVE  r,s 

ADD  r, {*}EI 

NB[T]  «-  '+' 
"associate  s  with  T" 


4.  MEM/REG(s)+NUM 

’El  -*-+  E2' 

'same  as  (13)' 

5.  NUM/f4EM 

'El  E2 ' 

'same  as  (2) ' 

6.  NUMI/NUM2 

6.1  I (  set:  same  as  (2) 

6.2  "associate  NUMI+NUM2  with  T" 

7.  NUM/REG(s) 

7.1  I |  set:  same  as  (3) 

7.2  'El  E2' 

'same  as  ( 10) ' 

8.  NUM I /REG ( s ) +NUM2 

8.1  I |  set 

'El  «-►  E2 ' 

'  same  as  (13)' 


+ 


INF0CT>NUMI+NUM2 


ll;F0[T>NUM2-NUMI 


NB[J>NB[E2] 


MOV NS  s 
"negate  L" 


MOVE  r,L(s) 
"associate  NUMI  with  T1 


"associate  r  with  T+NUM" 


8.3  0=RANGE[E2]  and.  U3ES[s>l 

same  as  (8.2)  except  replace  r  with  s  and  delete  the 
'MOVE  r,s'  instruction 


9.  REG(s)/MEM 

'Ll  «-*•  E2 ' 
'same  as  (3) ' 


10.  REG(s)/NUM 

10. 1  I 0  set 

'El  ■*-*■  f.2' 

' same  as  (3) ' 

10.2  I |  set 

'reset  mode  of  temp' 
'same  as  ( 14) ' 

10.3  El  a  temp 


Q<RANGE[El] 

1  o=range[ei] 

+ 

- 

+ 

- 

MOVE  r,s 

"associate 
T+NUM" 
"associate 
NB[T>'  +  ' 

MOVN  r,s 

NUM  with 

r  with  T+NUM" 

"associate  NUM 
with  T" 

"associate  s 

nbCt>nb[ei] 

"associate  -NUM 
with  T" 

with  T+NUM" 

El  not  a  tem 
associate 


"assoc,  s  with 
T+NUM" 


mode  of  reg="TEMP"  mode  of  REG="T+NUM" 


MOVE  r,s 
"assoc,  r  with 
T+NUM" 


"assoc,  s  with 
T+NUM" 


"associate  NUM  with  T" 


I.  REG(t)/REG(s) 

1 1 . 1  I ,  set 


'reset  mode  of  temp.' 
'same  as  (15)' 


11.2  I 2  set 

'reset  mode  of  temp7' 

'same  as  (15)' 

11.3  El  and  E2  temps 

a)  Q<RANGE[EI]  and  Q<RANGE[E2j 

MOVE  r,t 

rR  r 
E  ■*-  s 
SW  •«-  I ' 

b)  Q<RANGE[EI]  and  Q=RANGE[E2] 

'R  ■«-  s 
E  +  t 
SW  •*-  O' 

c)  Q=RANGE[EI]  and  Q<RANGE[E2] 

'R  ♦  t 
E  s 
SW  I ' 

d)  Q*RANGE[EI]  and  g=RANGE[E2] 

'R  +  t 
E  •*-  s 
SW  ■<-  I ' 


I 


167 


ADD  R,E 
NB[T>'  +  ' 


SUB  R,E 
SW=0  I  SW=I 


B[T>'  +  '|  NBLTJ-*-'  -'  I 

"associate  R  with  T" 

.4  El  a  temp  and  E2  not  a  temp 

'El  «-*■  E2' 

' same  as  (11.5) 

.5  El  not  a  temp 

1 1.5. 1  temp  associated  with  t:  same  as  (3) 

11.5.2  no  temp  associated  with  t 

E2  not  a  tempi  E2  a  temp _ 

ADD  t,s  +  |  - 


SUB 

sw=o 

R»E 

SW=  1 

NB[T>'-' 

NB[T>'  +  ' 

|ADD  t,s|  SUB  t,s 
NB[T>'  +  ' 


12.  REG ( t ) /REG ( s ) +NUM 

'El  «-►  E2' 

'same  as  ( 15) ' 

13.  REG(s)+NUM/MEM 

13.1  Q<RANGE[EI]  or  (Q=RANGE[El]  and  USES[s>l) 

'  *tem 


ADD  r , { * } E I  I  SUB  r,{*}EI 
nbCtj^dCei] 

"associate  r  with  T+NUM" 
"associate  NUM  with  T" 
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13.2  Q=RANGE[El]  and  USES[»I 

same  as  (!!.!'  but  replace  r  with  s  and  remove  the 
'MOVE  r,s'  instruction 

14.  REG ( s ) +NUM I /NUM2 

14.1  I  set:  same  as  (13) 

14.2  'El  E2 ' 

'same  as  (8) ' 

'3.  REG ( t  HNUM/EEG ( s ) 

15.1  I  set 

'reset  mode  of  temp' 

' same  as  ( 16) ' 

15.2  E2  a  temp 

15.2.1  Q<RANGE[E2]  and  (Q<RANGE[El]  or  (Q=RANGE[El]  and  USES[t]>l)) 


E 1  temp 

1  "temp 

MOVE  r,t 

El  + 

- 

\E2 

El\ 

+ 

MOVE  r, 

F2  + 

MOV NS  t 
"negate  L" 

» L  ( t ) 

+ 

ADD  r,s 
^BTT>'  +  ' 

SUB  r,  s 

nblt>'-' 

"ass' 

SUB  r,s 

nbCt>'  +  ' 

ociate  r  1 

/CO  r ,  s 

NE[T>'-' 

with  T+NUM" 

ADD  r , s 

"assoc i al 

SUB  r  ,s 

"e  r  with  T" 

15.2.2  Q<RANGE[E2]  and  (Q=RANGE[E I ]  and  USES[t>l) 

same  <*s  (15.2.1)  except  replace  r  with  t  and  delete 
the  'MOVE  r,t'  instruction 


15.2.3  Q=RANGE[E2]  and  (Q<RANGE[El]  or  (Q=RANGE[EI]  and  USES[t>l)) 


E2  + 


ADD  s,+  SUB  s,t 
f  NB[T>’  +  '  NG[T>'  +  ' 
"NUM^T"  "-NUM-'J" 


MOVNS  t 
"negate  L" 


- ADD  s, L( t)  SU1B  s,L(t) 

SUB  s,t  ADD  s,t  NB[T>,  +  ’  NBC^’-’ 


NB[T>’-’  NBCT]*-’-' 
"-NUM'v/T"  "NUM^T" 


"associate  NUM  with  T" 
"associate  s  with  T" 


"associate  s  with  T+NUM"  | 

15.2.4  Q=RANGE[E2]  and  (Q=RANGE[El]  and  USES[t>l) 


E2 

*temp 

temp 

same 

as  (15.2.3) 

same  as  (15.2.2) 

15.3  E2  not  a  temp 

15.3.1  temp  associated  with  s:  same  as  (13) 

15.3.2  no  temp  associated  with  s 


E 1  temp 

*temp 

El  + 

- 

El  + 

- 

ADD  s,t 
"NUM^T" 

SUB  s,t 
"-NUMVT" 

— 

MOVNS  t 
"negate  L" 

"s'V'T+NUM" 
NB[T>,  +  ’ 

ADD  i 

"s'VT" 

NB[T} 

s,L(t) 

<-*  +  » 

16.  REG ( t ) +NUM I /REG ( s ) +NUM2 

Let  B1s(Q<RANGE[El]  or  Q=RANGE[El]  and_ USES[t]>  I )  and 
(Q<RANGE[E2]  or_  Q=RANGE[E2]  an£  USES[s]>i ) 

B2=(Q<RnNGECEl]  or  Q=RANGE[El]  and  USES[t]> I  )  an£ 
(Q=RANGE[E2]  and_  USES[s>l  ) 

83h(Q=RANGE[EI]  and_  USES[t]=l )  and 

(Q<RANGE[E2]  or_  Q=RANGE[E2'j  and_  USES[s]>l ) 
g4  =  (Q=RANGE[El]  and_  USES[t>l )  and_ 

(Q=RANGE[E2]  and  USES[s]=l) 


L 


16.1  El  and  "temp,  E2  and  #  temp 


-  MOVNS  t 

|  "negate  El" 

16.1. 1  Bj 

MOVE  r,LI (t) 

ADD  r,L2(s) 

"associate  r  with  T" 
NB[T>'  +  ' 

16.1.2  B2 

MOVE  s , L2 ( u ) 

ADD  s, LI (t ) 

"associate  s  with  T" 
NB[T>'  +  ' 

16.1.3  B3 

MOVE  t, L I (t) 

ADD  t , 12 ( s ) 

"associate  t  with  T" 
NB[T>'  +  « 

16.1.4  04:  same  as  (16.1.3) 

16.2  El  not  an  *temp 

16.2.1  B2 


not  an  *  temD 


MOVE  r,s 


MOVNS  s 
"negate  E2 


E2  + 


MOVNS  s 
"negate  L 


ADD  r,t  SUB  r,t 

INF0p>NUMI+NUM2  INF0[T]>-NUM2-NUMI 
NBCT>’  +  ’  NB[T>'-'  M0VE  r>L(s)  |  MOVN  r,L(s) 

SUB  r  ,t  ADD  r,t  „  ADD  r’+ 

INF0[T>NUM2-NUMI  |  NF0[T>NUM  I +NUM2  nss°ciate  r  with  T+NUMI" 
NB[T>'+'  NB[T>'-'  "associate  NUMI  with  T" 

"associate  r  with  T+NUM" 


16.2.2  02 :  same  as  (16.2.1)  except  replace  r  with  s  and  remove 
the  'MOVE  r.s'  instruction 
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